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The Numerical Evaluation of Maximum-Likelihood 
Estimates of the Parameters for a Mixture of Normal Distributions 
from Partially Identified Samples 

by 

Homer F, Walker 

Department of Mathematics, University of Houston 
Houston, Texas 77004 


1. Introduction . 

Let TTj^, . . . ,ir^ be populations whose multivariate observations in ]R ^ 
are distributed with respective normal density functions 


P^(x) = 




- 2^ (x-Uj) 


If ir^ is a given mixture of members of these populations, then observations 
on TT^ are distributed in JK with density function 


m 


p Cx) = ^1^ “iPiC^c) 


for an appropriate set of proportions uj * These proportions 

m 0 0 

necessarily satisfy .2. a- = 1 and a. 5; 0, i = l,-,m. In this note, we 

3L • i 

also assume that each is strictly positive. 

We address here the problem of numerically approximating the maximum- 
likelihood estimates of the parameters {a?,y?,2?}. , determined by 

XXX X~^X } « » • 9^ 

samples of two types. Samples of both types corisist of sets {x,, } 

^ k=l,.,..Nj^ 




A. 


2 


of independent obseirvations on ir., i ** CThe sets {x., } , 

i = comprise the Identified observations of such samples, and such 

samples are said to be partially Identified .) We distinguish samples of the 
two types according to whether the numbers of identified observations 

contain information about the proportions i = l,...,m. If the numbers 

of identified observations contain no information about the proportions, 
then the sample is of the first type; otherwise, the sample is of the second 
type. The following are examples of how samples of the first and serond 
types, respectively, might be obtained: 

(1) For i = 0,...m, numbers are arbitrarily choosen and independent 

observations {x., } are obtained from tt.. 

k=l,-,Ni ^ 

C2) A number of observations are obtained from For some K^, 

of these observations are left unidentified, while the remaining 

K - N observations are identified. For i = 1, ...,m, a subset 
0 0 

{x., } of the identified observations is determined wlioso 

. . . ,N^ 

member observations come from ir.. 

X 

In the following, we consider likelihood equations determined by the 
two types of samples which are necessary conditions for a maximum-likelihood 
estimate. These equations, which were derived by Coberly [1], suggest certain 
successive-approximations iterative procedures for obtaining maximum-likelihood 
estimates. These procedures, which are generalized steepest ascent (deflcctod 
gradient) procedures, contain those of Hosmer [2] as a special case. Using 
arguiiteut :: tluit parnllcl those of [3] , we show that, with prol>abll ity 1 as 


3 


approaches infinity (regardless of the relative sizes of and 

N^, i = l,...,m), these procedures converge locally to the strongly 
consistent maximum-likelihood estimates* whenever the step-size is between 
0 and 2. Furthermore, the value of the step-size which yields optimal 
local convergence rates is bounded from below by a number which always lies 
between 1 and 2. 


2 . Samples of the first type . 

We first assume that numbers {n.} are given and that, for 

^ i=0, . . . ,m 

i = N. independent observations drawn on 

^ k=l, . . . ,Ni 

1T^. The log-likelihood function for a sample of this type is 


m N. No 

“ iii kSl l"s 


In this expression, the parameter vector 0 (with components a^, 
i = l,...,m) belongs to the vector space defined in [3], and 

the density functions on the right-hand side are evaluated with the true 
parameter vector 0° (with components a°, y?, 1°, i = l,,..,m) replaced 
by 0. 


As in [3], one can shovr that, given any sufficiently small neighbor- 
hood of the true parameters, there is, with probability 1 as approaches 

infinity (regardless of the relative sizes of and N^, i “ 1,** . ,m), a 

unique solution of the likelihood equations for either type of sample in that 
neighborhood, and this solution is a maxiraum-likelihood estimate. 
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Differentiating and setting its partial derivatives to zero 

gives the likelihood equations 


(l.a) a. = A.C0) 

X 1 


(l.b) p =M.(0) 

X X 


(l.c) 2^ = S^C0) 


_ ^ yO 

= Nq k«l p(x^^) 


r 

X 


k=l *ik 1?=1 "^Ok P(*ok^ / 1 k=l pCx,^,J 


+ , 2, X 


ok' 




^k=l ^‘^k^^i^ ^*ik *^i^ k=l ^^ok ^’^ok p(x^, ) 


„ a .p . (x , ) 
T i' 1^ nk 


f,, , 1 ° Vi^W , 
t-i ^ uii -70^ ’ 


Ok' 


for 1 = 1, , . . jDi. 
We set 



^A,(0)^ 



/Sl(0)\ 

A(0) = 1 

« 

1 

1 \<®) j 

, M(0) ■ ( • 

, S(0) = 

1 • 

1 


and define an operator $ on ky 

$ (0) = (1 - e)0 + e 


/HQ)\ 

M(0) 

S(0)/ 


Clearly, for any non-zero e, the likelihood equations are satisfied by a 

0 


vector 0 c 


if and only if 0 = 4^(0). 


We consider the following iterative procedure: Beginning with some 

starting value define successive iterates inductively by 

(2) = $ (0^j^) 
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for j = 1, 2, 3,... . Our local conyergence result for this iterative 

procedure, as stated in the introduction, follows immediately from the 
theorem below. 

Theorem 1 ; With probability 1 as approaches Infinity, is a locally 

contractive operator (in some norm on J ) near the strongly consistent 

maximum-likelihood estimate whenever 0 < e < 2. 

In saying that is a locally contractive operator near a point 

0 e J* , we mean that there is a vector norm I | | | on aiul 

a number X, 0 < X < 1, such that 

ll$^(0’) - 0!1 < X|I0‘ - 0|| 

whenever 0' lies sufficiently near 0. 

Proof of Theorem 1; Let 



be the strongly consistent maximum- likelihood estimate. We assume thai 
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0, i = CAs approaches infinity, the probability is 1 

that this is the case.) As in [3], it suffices to show that, with 

probability 1, V4> (Q) converges to an operator which has operator norm 

€ 

less than 1 with respect to a suitable vector norm on 
Now , . 


V$^(0) = (1 - e)I + e V M(0) 


and we write 


r 

V M 


V-A 

V-A 


a 

U 

E 


SHU 


a 

y 

£ 


V“S 


a 

u 

E 

JTfr, 

< . 

on 2 


as in [3]. Setting 
P^Cx) 

" p(x) » ~ li^) -T],K. = N. + 

for i = l,...,m, one calculates 


1 % 

V^(0) = I - (diag a^) ^ S° 

0 1 


V|jA(0) = 


V^A(O) - 


Cdiag a^) 


1 y 

H ^ 


(diag a^) 




<3 Y > 
m'm TO 


‘^riv n 


<3 5 ,*>" 

m m m 
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^(0) = (diag ^ e^Y^) 


1 (Hi 


^0 T —1 

yi(G) . (diag ^ I y^yl\\) 


“i I ^0 f ^1*^1 

(diag ~) I s : 




<3 Y » •> ' 
mm ra 


.Y.» \ T 


1 "o “i r^o 

yi(0) = (diag — S *>'P - (diag ^)»/ Z 

i 1 X ^ 



'Vi-- ;■ 


<S S 
m H} m 


y(0) 


S. Nq a : 

(diag ™ E 3,6.) - (diag -~ 

'Si 1 ^ ^ ■ 


a.E. f Nn 


i^i { *^0 

X ^ J* , O 


Pl\ ■*’ 


y(0) 


0 T 


(diag ^ {-Z KOYi+YiC*)"] - f tCOYi+YiCO'lPi+i:^ J V'^iYi'*''! 


- (diag 


a .. (W p^v:i’ 

1 (. I Q ^ J .p .^t -> 




V-^(O) = (diag — Z *>p - (diag 

i ^ 


s / ^ 

^i i 1 


3 6 

ra m j 


<fi S , * > 

tu m 


Here, the arguments of be deterralned from the Indices 

of summation, e.g. , > 


I h-^i ‘ kSl 
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Setting 


OF ‘THS 


l<\ "SPS**™* 


V = 


m 


one obtains at 0 


Ml 
V ( m\ 
\sl 



0 


®21 

^22 

®23 


®32 

® 33 j 




Pl^l 


\e / 

\^m m/ 


( (Jiag 

o 


0 


a. 

0 (diag 


0 


0 


0 
0 

aX. 

(diag I 




where 


No 


- (dlag f: I B,Y.) 

X ^ 

®22 - <“‘>3 I h'> 

X • , 

®23 ■ Wi^s lrf 
S . Nq 

®31 ° ^‘'“8 iC I 

B32 = (dlag i-[- E^[(-)Yi+YiC-)’'l - aJ°K-)Yi+YiCo'^lH. + [I > 


. % 

1 V 


"33 “ 1 ■ 

1 X 
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BEIPRODUCIBILITY OF THE 
ORIGINAL PAGE IS POOR . 


We have assumed that 0 is the strongly consistent maximum- likelihood 
estimate. Then, regardless of the relative sizes of and N^, one can 

show as in [3] that, with probability 1, {7$^(0) - (G°)) } converges 


to zero as approaches infinity. Now 


rA(0°)'^ 
E(V I MCO°J 
,S(G°); 


( 


0 


) = 


0 

0 


a?N 

(diag I) 

i 


0 

0 


\ 


a N 

0 (diag 1) 


) 


(diag a^) 


0 

(diag 

0 


0 

a°N 

X 0 


{ /V(x)<V(x) , ‘''pCyDd 




jt 


n 


where 


« B(I - QR), 


I 0 0 

(x9N 

B = 0 (diag — I) 0 

*^i aP.U 

0 0 (diag --^ 1 — I) 

i 


(diag Op 0 0 


0 0 (diag ip 


R = / V(x) <V(x) , *>p (x)dx , 


|R‘‘ 


EBPRODUciBiLrry of the 

nH-TfilNAL PAGE IS BQQgu 
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It was shown in [3] that QR is .positive-definite and symmetric with 

operator norm less than 1 with respect to the inner product <*»Q ^*> on 

. It follows that I-QR is positive-definite and syimnetric with 

norm less than 1 with respect to <‘,Q ^*>. Since B and Q couuimte, 

^*> is an inner product on > and one sees that 

<W,Q"^J> S for W e ‘ Consequently, B(I-QR) is 

positive-definite and symmetric with norm less than 1 with respeit to tlu* 

-1 -1 

inner product <*,Q B •> . One concludes that 


ECV$^(0°)) = (1 - s)I + c E(V 


/A(G°)\ 
M(0°) ) 

^S(G°)J 


has norm less than 1 with irespect to <*,Q ^‘> whenever 0 ■ <. ■ 2. 

This completes the proof of the theorem. 

We remark that, reasoning as in [3], one nuiy dctcmiitu’ a particular 
value of c (the "optimal £*') t^hich yields, with probahility 1 as 
approaches infinity, the fastest asjanptotic unifona rates of local conver- 
gence of the iterative procedure (2) near 0. This optimal e is given by 


= 2 
^ 2 - (T+p) 

where p and t are, respectively the largest and smallest eigenvalues of 
B(I-QR) regarded as an operator on H is the subspace of 

v/hose components sum to zero.) Since p and T lie between zero and 1, 
one sees that the optimal c is always greater than 1. If the irompouent 
populations are "widely separated," then p and T are near zero and. 
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hence, the optimal e is near 1. If two or more of the component population!! 
are nearly indistinguishable and if is large relative to the N^'s, 

then T is near zero, and the optimal e cannot be much smaller than 2. 


3. Samples of the second type. 


We now assume that K obseirv'ations are obtained from the mixture 

0 . 

population IT , and that, for some N < K , N of these observations 
^ 0 ^ ^ o o’ 0 

are left unidentified, while the remaining observations are 

identified. For i = l,...,m, let {x,, } denote the subset of 

the identified observations which come from *iT., and let {x , } 

Olt k==l,...,No 

be t]xc set of unidentified observations from ir * The log-likelihood 

o ^ 

function for this sample is 


1 m 


m 


+ ill til !•><! Pl(*lk) + fcll 1”S P(^ok’ 


m 

».)!,„ % N 

' t-„Vv.k ,} + ill kil l»slVl'“iic” kil pC*ok> 

1 m 


Differentiating and setting its partial derivatives to zero gives 

the likelihood equations 


(3. a) 


^i ^i 

a. = A. (0) ^~ + ~ , T.. — / - V 
i x' K K k=l p(x , ) 
O O ok 


(3.b) 




(3.C) 


\ = s^(0) 


«-• 


7^. .. 


fox' X ~ • • • jin* 


We set 


A(G) = : 


AiC0) 


A^C0) 

in ‘ 


and define an operator on by 


$^(0) = Cl - e)0 + e M(0) 


Our iterativ^e procedure is the following: Beginning with some starting 

value define successive iterates Inductively by 

C4) = $ (0^^^) 

£ 

for ^=1,2,3 As before, the desired local convergence result for 

this iterative procedure follox^s from the theorem below. 


Theorem 2 : With probability 1 as approaches infinity ^ is a locally 
contractive opf^rator (in some norm on near the strongly consistent 
inaxiratun-likelihood estimate whenever 0 < e < 2* 


Proof of TheoreBt 2 : If 0 is the strongly consistent maximum-- likelihood 

estimate, then, as before, it suffices to show that, with probability 1, 
V$^(0) converges as approaches infinity to an operator which has 

operator norm loss Hum 1 with respect to some vector norm on . 

Proceeding as before, one sees that 


I 
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N. 

V^(0) = (diag (1 - ^)) 

1 0 


a. « 

- (diag J E 

n ^ -L 


“i f ^0 

-A(0)= -Cdlag j 2 : I 

^0 i 1 • 1 


otd . No 

7^A(0) = - (diag ~) ^ E 




<3 Y > •>’ 
m'm’ m 


<Vi.->i r’ 


<3 6 ,•>" 
m ni ra 




The remaining Frechet derivatives, i-e., the derivatives at Q of M mid 
S with respect to a, y, and E, are unchanged, except that must be 

replaced by wherever it appears. 

One obtains at 0 


(4) V M = 
( S / 


^ )) 
a.K 

0 

0 

X 0 

1^ 

rsj 

’21 

®22 

®23 

31 

^32 

®33 


a. 

(diag 17^) 


E. 

(diag ~) 




In this expression, each B , is the same as the corresponding B defined 

J K j !<■ 
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previously, except that each in the latter is replaced by in 

the former. One verifies that, with probability 1 as approaches 

infinity, (4) has the same limit as B(I-QR), where Q and R are as 
before and B = ^ I. Repeating our earlier reasoning, one verifies that 
B(I-QR) is positive-definite and S 3 nmnetric with norm less than 1 with 
respect to the inner product B *> . Hence 


V$^C0) = (1 - e) + eV 


/a(0)\ 

M(0) 

\S(0) / 


converges to an operator which has norm less than 1 with respect to 

<*,Q B . > whenever 0 < e. < 2. This completes the proof of the theorem 


The remarks concerning 
preceding section are valid 


the ’'optimal e” 
here verbatim. 


at the conclusion of the 
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ABSTRACT 

We develop a procedure for calculating a kxn rank k matrix B 
for ddta compression using the Bhattacharyya bound on the proba- 
bility of error and an iterative construction using Householder 
transformations. Two sets of remotely sensed agricultural data 
are used to demonstrate the application of the procedure. The 
results of the applications give some indication of. the extent to 
which the Bhattacharyya bound on the prohahility of error is af- 
fected by such transformations for multivariate normal popula- 
tions . 

t 

1, INTRODUCTION 

* For n-dimensional normal classes N(^E^) i = l,...,m, the 
Bhattacharwa coefficient (Andrews , 1972) for class i and j is 


given by: 




and the Bayes probability of error (Anderson, 1958) (Andrews, 1972) 
by 


P 1 — / max {<i.p.(x)}dx 

^ n / j / ^ 


R I4l4m 


where denotes the conditional density of the random vari- 
able X given that X 'V and respectively, 

denote the (known) a uriori probabilities of the classes 

X “ 1, » • • ,ma 

It has been shown (Andrews, 1972) (Kaileth, 1967) that 


m 


P < 
e ^ 


M j^i 


(x)}'^dx 


If one considers a kxn rank k linear transforinatlon B of the ran- 
dom variable X (l.e, , YHBX) , then the Bhattacharyya coefficient 
for qlass 1 and j for the clas|es H(B/jt^,B2^B ), 1 = 1, . . . ,in is; 

■ 

Ps(i.j) = {q^qj>^k(Pi(yjB)p^(y‘.s)i^'^dy 


and the Bayes probability of error for the classes N(B/x^,BS^B ), 

1 1, • « « rs * 


P^(B) 


_ , /, max {p. (y,B)}dy 

“ ^ " •'e.^ 1414m 


where p. (y,B), 1 = 1,» .. ,m denotes the conditional density of the 

* T 

random variable Y =• BX given that Y N(Bju ), it follows, 


since ^ ^ P = 2-^ tha.. 

® i=l • j=i+l 


P„(B) L p(B) H V y Pb 


and moreover, (Becell and Quirein, 19^3) (Kaileth, 1967), that 
CD 4 P^(B)4 p(B). 

0 c 

(2) Pg ~ only if p = p(B). 


2. THEOSETICAL PRiSLIMIMARIES 

Let k be an integer (0 < k < n) , and NCu ,S.) i = 

i ^ 

be n-variate normal populations with ^ priori probabilities 
q^, , . . ,q^. We Would . like to construct a kxn rank k matrix B that 
will minimize p(B). The theoretical extent to which this is pos- 
sible and the basis for the construction (Decell and Smiley, to 
appear) is summarized in the following theorem. Let 
C = { u eR’^:||ul) = 1} and T(H) ={h=I-2uu'^: u e C} denote the 
set of Householder transformations on (Householder, 1958) . 

Theorem. For each positive i, let t: T(H) be chosen such that 

■ p((I, |z)H-)= g.l.b p((I, |Z)H) 

HeT(H) ^ 

and 

then, 

(1^ p<(Ij^lZ)H^_j^j^H^-**H^) <p((lj^|Z^^^ 

(2) p((I^|z)H^^^--H^)<p((Ij.IZ)H^*-*H^H, H e T(H)). 

(3) p((I^lz)H^^^H^-“H^)<p((lj^|Z)HH^-*-H^, H e T(H)). 

(4) p((lj^lZ)H^^^H...^ 

and p = 0, , . . ,i-2. 

(5) The monotone sequence of real numbers {p(B.)}~ , where 


is bounded below by and hence 


lira p(Bi) = g.l.b. |p(Bi)| 
1 ^ 1 . 


We know (Decell and Quire in, 1973) that there is some kxn rank 
k matrix, say B, that minimizes P(B). If p(B) < (B^)} 

we will call the sequence sub optimal (optimal in the 

case of equality) » There are several results (Decell and Smiley, 
to appear) that lend credibility to the conjecture that the seq- 
uence is optimal and cofinallv constant beyond the index 
i = min{k,n-k}. We will proceed with the development of an itera- 
tive procedure 'for constructing the subject sequence and, finally, 
tabulate results of applications to remotely sensed agricultural 
data with equal a priori class probabilities. The approach (and 
its merit) will depend upon the bo\md provided by the inequality 
P < P(B. ) i = 1,2,..., the non-increasing nature of the sequence 

{p(B^) 2 nd the ability to manipulate the expressions for- 

p(B.)» i = 1,2,... in the case of normal populations. 


3. THE GRADIENT OP p((Ij^ I Z)H) 

We will develop an expression (for the case of normal n-vari- 
ate populations N(^^,2^), i « l,...,m) for the gradient of 
P((Ij^jZ)H) where H g T(H) has the form ^ xx^ x 6 

/ ■ ‘ ^ x^x ^ 

This expression will be used in a steepest descent procedure to' 
calculate each Householder transformation H^, H^, H^,. .. des- 
cribed in the preceding theorem. For ia populations N(^E^), 

1 = 1,.. .,m it is easy to establish that in order to calculate 
“l+l* one need only apply the steepest descent procedure to the 
Bhattacharyya coefficient determined by the populations 


The expression for P.- i^^„Ci,j) is given by (Andrews, 1972) 

(Kaileth, 1967) (for the case of equal ^ priori probabilities 
q ♦ “ 1/iQj i “ 1, « • « jti) • 


where = (I^jZ)HC^-jLi^) and = (I^^ j Z)H e.H(Ij^| 2)’^, in which 

case. 


:^1 ^ 

PCCI^|Z)H) . E_^,P(I 


1=^L j^l '■ ki 


If we define 


!'ij "4'Sij^^i'^^j^’Sij ^ij~"2^"( „k,: 


/% /K 


2 ISil'ISjl 


we have that the differential 


^(I 


d(p 


(I JZ)H 


(i,j)) = t exn(F;,+G.,)(dCF..) + d(G..)). 


m 


iJ ij 


Id 


IJ 


from whence it follows that 


- tn~^ m 

■i(P((It|z)H» -iE E +'^«i3»v 

i=l j =1+1 

In order to simplify the notation, define E. . =2. + Z. and 
^ij * 

Let tr(*) denote the trace of (•) and 1*1 = det{*). With 
a bit of matrix algebra it follows that 

I Z)HS - .HCI J z)^)"^ (I. I Z)HA_H(r I Z)^} 


ij"'^k 


ij ' k' 


ln| + ^ Inl (Ij^|z)HS^H(Ij^lZ)’^| 


IJ 2 ■ - K' ■ ij 

4- ^ Inl Clj^lz)HS^H(Ij^|z)^[ + I ln2. 


We will now develop expressions for d(F^j) and dCG^j), i,j 
According to Decell and Quirein (1973) 

d(F_) =-| tr{dCClj^|Z)H)Q_} 


= 1 ,. 


where B = (I, |z)H and 


Q.. = S-.B^( 3 S..B^)"Sv B^](BZ B*^)“^ 

I 3 13 13 ' 13 ij xj ' . 


Since H = I - 2 — ^ it follows that 

X X 


T / 

d((I^|z)H) = d(Clj^|Z) (I - 2 )) = -2(Ij,[Z)d(^j 

XX VC K / 


7( t ' Izvl - xx'^dCx^) 

-2Cln!z) 


n . T T T T T T 

{x xCd(;c)x 4-xd(x) )-xx^(d(x) x+x d(x))) 


f T .2 

(x x) 


2 ( 1 , 1^) ™rpm TT T TT 

— ^ {(d(x)x XX +XX xd(x) -XX d(x)x — xd(x) xx 

(x x)^ 


-2(1, (Z) 

. rrr *p m fp. 

— 5~ {(d(x)x'-xd(x) )xx -xx (d(x)x -xd(x) )}. 
(xx)^ 


Substituting the latter in the expression 

■ * 1 

■ 1 • 1 
“*'2 . • ' 

and using the fact that tr(AB) = tr(3A), we have 

* 

d(F.,) = -|tr T^ ~ p ~ [(d(x)x^-xd(x)'*^)xx^-xx^(d(x)x^~xd(x)^)] 

^ 1 Cx^x)^ ) 

= — I — 5^^Q,' • I (d(x)x^-xd(x)^)xx^-xx^ (d(x)‘x^-xd(x)*^) J} i 

Cx\)^ / 

= — trtxx^Q. . (Ij^lz)(d(x)x““xd(x)^)-Q^.(l^| Z)xx^(d(x)x^ 

(x x) ' ^ ^ 

-xd(x)^)}. 

With a^ little matrix algebra (and some patience) it follows that 

tx XJ , ^ 

(I J Z) - (Ij^l Z) xx"^) ]xd Cx)^} 

We now find an expression for dCG^^), First, recall 
(Kullback, 1968) that f- 

d(ln|BSB^|) = 2tr(d(B)£B^(BEB'^)“^} 

so that 


^ d (G, ^ ) = - tr{ d C (I J Z)H)Z (I J Z) ^ ( (I J 2) H (I^| Z) ^) 

-|tr{dC(lJZ)H)2^H(Ij^lZ)^(Clj^|Z)H2^H(Ij^|Z)'^)"^ 

+ I tr{d((lj^|Z)H)2.H(lJZ)'^CClJz)H2 .H(I^1Z)^)“^}^^ 



Obviously, the suironands in the expression for d(G. .) differ 

• 

from the expression 


d(F^^) = tr{d((Ij^l2)H)Q^ J 


only by multiplicative constants and the matrix Q 


Hence, we 


may use the final expression for d(F^^) 


ij 

to obtain the expression 


for d(G^j) by simply adjusting the multiplicative constants and 
replacing (in each summand in with the expressions 


= 2^.H(lj^|Z)'^t<Ik|Z)H2jHClj^)Z)V^ 


At this point we will simplify the notation. Let 


= (xx^Q^^.Clj^|z)-Q.^(Ij^lZ):jx^)^-(xx\^Clj^JZ)-Q^^Clj^lZ)xx^^ 


and let J. 
respectively , 
i,3 Ij • • • ,m 


✓V A 

K. and L.. be similarly defined by substituting, 

J. and L. . for Q in the expression for Q.. 
ij ij ij • ' ij 

. It ■ follows that 


9 


d(F ) = tr(Q xd(x)^) 

(x^r 


d(Gij) = — ~~x tr(J. .xd(x)^) 1 — r tr(K. .xd(x)^) 

(x x) (x x) ^ 

- Y tr (L xd (x) . 

(x x) - 


In order that x he extremal, it is sufficient that x satisfy 


REPRODUCIBILrrY OF THE 
OBIGMAL PAGE IS POOR 


” k Ml 






Of course, the function G(x) is the gradient of 
T 

p((lj^|z)(I - 2“— )) with respect to x. 


With G(x), we use a steepest descent technique to construct 
The process is repeated for the construction of H 2 since, 
given the problem of constructing H 2 is identical to that of 
constructing provided the populations are taken to be 

**^^l^i'^l^l\^ i=l,...,m. 

Test result^s are .presented in the following tables for nine 
tti^elve channel, C-1 flight line agricultural classes; soybeans, 
corn, oats, red-clover, alfalfa, rye, bare soil, and two types of 
wheat. ^ The Hill County data is sixteen channel data for five 
agricultural classes: winter wheat, fallow crop, barley, grass, 

and stubble. 


C“1 FLIGHT LIKE DATA 
n ** 12, m = 9, k = 6, P = .024 


Iteration 

\ 

“=2 

®3 

0 

.327 

.109 

.134 

1 

.223 

.060 

.034 

2 

.171 

.062 

.033 

3 

.135 

.068 

.032 

4 

.116 

.058 

.031 

5 

.1157 

.055 

.0309 


6 


1150 


.054 


.0303 



























HILL COUMTY DATA 


n = 16, m == 5, k = 6. p° .107 


Iteration 

\ 



0 

.872 

.336 

.299 

1 

.785 

.310 

.287 

2 

.525 

.286 

.232 

3 

.439 

.273 

.227 

4 

.576 

.267 

.226 

5 

.386 

.265 

.224 ‘ 

; 

6 

.363 

.264 

.223 
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abstract 

Classifying large quantities of multidimensional data (e.g., 
remotely sensed agricultural data) (Remote, 1968) requires effi- 
cient and effective classification techniques and the construction 
of certain transformations of a dimension-reducing, information- 
preserving nature. This paper will deal with the construction of 
transformations that minimally degrade information (i.e, , class 
separability) . We will only consider the construction of linear 
dimension-reducing transformations for multivariate normal popu- 
lations and information content will be measured by divergence 
(Kullback, 1968). f- 

■> - 1. INTRODUCTION 

For n-dimensional normal classes N(m^,V^) i = 1, ...,m, the 
divergence between class i and j (Kullback, 1968) is given by 


I 

Let S. . = m.-m.. Then 
ij i J 

i Dy - |tr[(Vj^-Vj)(V"^-v2^)] + |tr[(v:^+ 

I - (V^ + +|tr[V-l(Vj^ + 6.^6^^)] - n. 

I The interclass divergence (Decell and Quirein, Oct. 1973) for m 

populations is given by 


D = > > D.. 

4^^ 4?^ a-3 


^1 3^ 
ii'j 


and it follows that 


hz 


m(Ta-l) 


n 


i= 




* S,] - 

1-, 


m(m~l) 


2“"4-t'i “i-* 2 “» 


where 


= > (V. +6..6t.), 

i ^ 3 i3 3-3 


If B is a k ^ n rank k matrix, the B-interclass diver- 
gence (Decell and Quirein, Oct. 1973) is given by 







®®fiOD0cjBjLijY 

POOfi 






J. 




I 




- m 


(BS^b’^) J - k. 


As In the case of average interclasa divergence, the B-interclass 


divergence is a measure of the "separation” in the classes 
,T. 


N(Bm^,BV^B ) i = l,...,ra, and is a useful tool for constructing 
rank k linear transformations that preserve "class separability". 
It has been shown (Decell and Quirein, Oct. 1973) that whenever 
D = D„, the probability of misclassification (Anderson, 1958) for 

trt 


the classes N(Bm^,BV^B ), i = l,...,m is the same as the probabili- 


ty of misclassification for the classes N(m^,V^), i = l,...,m. 


2 . THEORETICAL PRELIMINARIES 


We will assume that k is an integer (k < n) and develop a 


procedxire for selecting a k x n rank k matrix B such that B_ is 

Jj 


maximum. The procedure will be based upon the following theorem 
(Decell and Smiley, to appear). We will let C = {u E | |u| j=l} 
and T(H) = {h = l-2uu^: u E c| denote the set of Householder 
transformations defined on (Householder, 1968) . 


Theorem. For each positive integer i let E T(H) be inductive- 


ly chosen such that 


• •'*1 “ He?’®)' 


where 


HeT(H) 


The following hold: 

(1) B 




pgga 




- 1> -- 


«> ”(\|Z)\\.,- • •H^.(j.i)HH^.(P+1j • • -Hi ^ °C\I«%1- • •“I. 

for every H e T(H), p = 0,1, . . . ,i-2. 

(5) The monotone sequence 

{D }“ = Id.- j„s„ ..,„ } is bounded above, 

i=l “l i=l 


and hence 


i2 


•••H- 

1 1 


l.u.b. {D. 


}. 


We would, of course, be pleased if it were the case that 


l.u.b. j 2 j’g ^ ~ "^his, unfortunately, is not always 

^ Ic r 1 

the case for some choice of k < n and is not possible, in general 

for any k < n. We do know that there is some k n rank k 

matrix' B for which D is maximum and, in general, that D ^ D 

B B 

(Decell and Quirein, Oct. 1973). It follows, moreover, that since 

the matrices of the form (I have rank k, 

iC 3_ X 


B 




4 B 4: D for every integer .!• 


We -will call the sequence {B 
whenever 


(Ik|2)Ri“-H^i^l 


suboptimal 


l.u.b. Dg 

(and optimal in the case of equality). 

There are several open theoretical questions that deal with 
the conjecture that the sequence is, in general, optimal and co- 
finally constant beyond the index i = min{k,n“k} (Decell and 
Smiley, to appear) . In what follows we will develop a procedure 
for constructing the subject sequence and demonstrate its 
application to agricultural data. 


•3. THE 6HADIENT OF D, 


It has been, shown (Quirein, Nov. 1972) that the differential 


dD of D (regarded as a function of the k x n matrix B) can 

ii ii 


be expressed in the form dD = F + G, where, when the indicated 

u 


inverses exist, 

_F = (BV^B'^)"^(dB + BS^dB^) ] 


x=J 


= |tr[^(dB Si3^)(BViB’^)“^] 






m 


= tr[2_^ (dB S^B^)(BV^b’^)~^] 


and 


G = -^tr[5**l (BV^b'^) ^CdB V^B^ + BV.dB^).(BV^B''^)“^(BS^B^) ] 


ftr[V (dB V^b'^)CBV‘^B^)"^(BS^B^)(BV^B^) 


x=J 


.-■|tr[]^ (BV^b'^)~^(BS^B^)(BV^b'^) ^(BV.ds'’^)] 


.= - CdB V^B^)(BV^B^)“^(BS^B^)(BV.b’^) ^ 






Thus, 


IQ 


^(BS^B^)}(BV.b’^)~^] 


= tr 


dB Q^ 


where 


Q^ = [{S.B^ - V^b'^(BV^B^)“'^CBS^3'^)}(BV^B^)“-*']. 


.T^-1, 






We are, of course, interested in extremizing over the 

particular subclass of k x n rank k matrices of the form 
(Ij^|z)H where* H e T(H) (e.g. , for i = 1 we find that maxi- 
mizes jjrju )• Actually, one need only consider what is re- 

k 

quired to compute The computation of H 2 is accomplished by 

the same procedure as that for It is simply a matter of , 

after selecting redefining the m classes to be 

N(H^m^,H^V^Hj^) , 1 = 1,. ..,m and proceeding as in the selection of 

With these facts in mind we will simply calculate the gra- 
dient of Dg where B is restricted to having the form 
B = (I^lz)H, H e T(H) . The restrictions H £ T(H) can be accom- 
plished by considering those k x n rank k matrices of the folnn 

, ■ - - ■ ■ ■ ■ T- " ' ■ ' 

B = (Ij^lZ)(I - 2 ), weR°(w?J0) 

w w 

It follows that 


dB « d[(Ij^l z) (I - 2 = -2CL jz) d(wwWw) 

w w 


T, T 


- '>rt \t\ r W^wd (ww^ ) - ww*^ d (w^ w) , 

- I -^--2 ] 

(w w) 


T T T 

— — = — T-[ww(dw w + wdw ) 

(w w) 


T T T 

ww (w dw + dw w)] 


2 aJz) 


k' T T , T ,T T , T , T 


(w^w) ^ 


[dw WWW +wwwdw - wwdww - wdwww] 


2 (X, |z) T T T T T T 

— r[ (dw w - wdw )ww - ww (dw w - wdw )] 

(w w) 


Substituting the latter in. the expression, for dD , 


dB._ = tr > , [ - 


A 2 (It,|z) 


X ^ *!I! X JL 

{(dw w - wdw )ww “ ww (dw w - wdw )}Q.] 


(w\)" 


£ 2 Qi(I|^|Z) ^ ^ ^ ^ T T 

E ^ A — {(dw w - wdw )ww - ww (dw w - wdw )}] 

(w w) 


tr 22 — QjL (\|z) (dw w^ - wdw*^) 
(w w) . 


- Q^(I^| Z)ww^(dw w"^ - wdw'*')] 


— tr [M.dw w"^ - M.wdw^ - N.dw w"^ + N.wdw^] 

ivW M ^ 


Where \ = ww’^Q^(Ij^|z) and = Q^(Ij^lz)ww^. 


dD = „ tr[/ {w^ M. dw - mP" N, dw + N. w dw^ - M. w dw"^}] 

S i i 3 . 3 - 


(w w) 1=1 


— = r 9 trE^ Edw"^ mT w - dw^ nT w + N, w dw^ - H w dw^}] 
(w^W)'^ iSi ^ i X i 


J 


= — tr[V {m: w dw^ - N w dw^ + N w dw - M w dw }j 
® (w w)^ M ^ 

= tr [^ { (M^ - - i\- \) >w dw*^ 3 . 

fw w) 


(w\)^ 


Ttie necessary condition that w be extremal is then, 


G(w) = 


(w w) x=. 


{(M. - N.)^ - CM. - N.)}w = 6 (the zero vector), 

X X XX 


T 

We note that G(w) is the gradient of [Z)(I “ 2^” ) 

^ w^w 

use a steepest descent procedure for finding the extremal w. The 

pj^ocess is repeated for each sequential index until correspondxng 

values of divergence ’’stabilize.” Test results are presented in 

the following tables. The C-1 flight line data is twelve channel 

data for nine agricultural classes: soybeans, com, oats, red-_ 

clover, alfalfa, rye, bare soil’, and two types of wheat. The Hill 

County data is sixteen-channel data for five agricultural classes : 

winter wheat, fallow crop, barley, grass, and stubble. 

The starting value w^ for the steepest descent procedure 

for selecting each successive Householder transformation ,, 

*L 1 XT* 

was arbitrarily chosen to be w = ( — y — , — ) . 

± ^ -j o i — 

/n /n /n 

Choosing starting values in this arbitrary fashion is certainly 
not the most clever thing to do in the presence of the monotone 
behavior of the sequence „ . One would expect, for 




example, that the starting values for tha selection of ^^.j .2 
should depend upon the unit vectors previously selected as gener- 
ators of in such a way as to guarantee that the 

starting value w^, for the descent procedure for selecting 



satisfies 

« 

• w w 
o o 

This rather arbitrary selection of the starting vector does, as 
the examples demonstrate, violate the latter inequality. The 
question about how to choose starting vectors, according to the 
latter inequality, is still an open one and its answer would cer 
tainly decrease computation time. . 


C-1 Flight Line Date 
n=12, k=6, m=9, D^1Q,660 


Hill County Data 
n=16, k=8, m=5, D=636 


Iteration for H, 


* j Divergence 


1982 


3536 


4533 


5781 


6910 


7522 


7710 


7790 


7838 


7865 


7881 


7892 


•Iteration for H, 


Divergence D 

3 


114.58 


136.66 


152.27 


179.69 


223.81 


247.42 


252.78 


257.12 


260.74 


263.95 


^Iteration counter 








































aEPEODUCBIUTY OP THE 


C-1 Flight Line Data (cent.) 


Hill County Data (cont.) 


Iteration for Hg 


Iteration for 


1 A 
Ho 

Divergence D„ 

i 

7815 

2 

8797 

3 

9542 

4 

9785 

5 

9901 

6 

9966 

7 

10,005 

8 

10,031 

9 

10,048 


A 

No 

Divergence- Dg 

1 

269.00 

2 

280.48 ' 

3 

293.32 

4 

300.68 

5 

304.07 

6 

306.19 

7 

307.74 

8 

308.95 

9 

309.93 


Iteration for 


Iteration for 


Ko 

_1 

_2 

_3 

_5 

_6 

_7 

8 


Divergence D 

7582 ' 

8705 
9809 
9947 
9995 
10,020 
10,037 
10,049 
10,058 


B 


9 
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DOCUMENTATION 

Computation of the Total and the B-average Bhattachary a Distance : 

(Univac 1108, Univ. of Houston). f 

This program consists of 3 subroutines to be executed in the following 
sequence: 

.(1) Subroutine BHATT 

(2) Subroutine BHATBl 

(3) Subroutine BHATB2 

1. SUBROUTINE BHATT 
ABSTRACT 

This subroutine calculates the total Bhattacharyya Distance, BDIST, using 
all N channels. The output of this program, BDIST, will be used in comparing 
the difference 6^ = Hg - BDIST where Hg is the B-average Bhattacharyya 
Distance computed in the subroutines BHATBl, BHATB2. 

User*s Information: ‘ 

(Double Precision Version Only) . 

In order to use this subroutine the following FORTRAN calling sequence 
must be given : 

J3ALL BHATT (COVARj XMEAN, M,N, BDIST) 

where: 

COVAR(input) is a real 3-diraensional array (M)(NxN) and contains 

the li NXN class covariance matrices (positive de- 
finite symmetric) used as input. 


_______________ \ 

2 . 

> , 

3IMEAN (input) is a real 2*-dimensional array and contains 

t ^ 

the' M .H-dimensional class mean vectors. 

M(input) is the no. of classes under consideration i.e. the 

1 * 

no. of covariance matrices and mean vectors. 

N(input) .is the dimension of the covhriance matrices and the 

■ mean vectors. ‘ 

BDIST (output) is the value of the total Bhattacharyya Distance—com- 

puted by subroutine BHATT. 

SUBROUTINES USED ; 

Subroutine BHATT in turn calls the following subroutines 

1. Subroutine MATMUL. This subroutine computes the product of 2 

•- matrices. It calls subroutines SUPSUM and ORDER. 

2. Subroutine CHLSKY, This subroutine computes the inverse of a 

positive definite symmetric matrix, 

3. Subroutine DET. This subroutine computes the determinant of a 
positive definite symmetric matrix. 

NOTE ; (1). The format statements for input, output are dependent upon the 
dimensions of the input uata and corresponding adjustments have to be made to 
formats when different sets of data are run. 

>r(2) . The variables declared in, the DIMENSION statements have to similarly 
correspond to the dimensions of the input data, 

ALGORITHM ; . - . 

Subroutine BHATT computes the value of the total Bhattacharyya Distance 
using the covariance matrices and mean vectors as inputs. 


3 


The total Bhattacharyya Distance, BDIST, is computed by the formula 



- m-1 m 

BDIST = ^ Z1 H(i. j) 

^ i=l j=i+l 


where H(i,j), the interclass Bhattacharyya Distance. betvireen. classes i and 
j is given by 


[S S I 

H(i,j) = exp[-.-| (2^ +• 2^.) - -j An ^N|/,1/2|^- |l/2 


2^|2.|-^/^|2j1- 


where = "[i^ - u^ and is the mean vector corresponding to class i 

and 2^ is the covariance matrix corresponding to class i. ^ 

2. SUBROUTINE BHATBl ! 

ABSTRACT 


This subroutine attempts to calculate the minimum B— average Bhattacharyya 
Distance using 1 Householder transformation to construct the B-matrix. 

USER’S INFORMATION : 

(Double Precision Version Only) 

In order to use this subroutine the following FORTRAN calling sequence must 
be given; 

6ALL BHATBl (COVAR, XMEAN, M,N, K, ITE, ALPHA) 

where 

COVAR(input) is a real 3-dimensional array (M H*N) containing 

i 

the M NxN covariance matrices. 
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XMEAK (input) is a real 2-dimensional array (MxH) and contains 

the M H-dimensional mean vectors used as input, 
is the number' of classes' under consideration (i.e. 
the no, of covariance matrices and mean vectors) . 
is the dimension of the covariance matrices and the 
mean vectors, 

is the number of rows desired in the transformation 
matrix B (which is KxN)-^ 
is 1 *i* (the no*, of iterations required) 
is a varying parameter in the iteration formula. 

OUTPUT OF SUBROUTIME BHATBl 

This subroutine has the following output: 

1. The transformation matrix B (which has dimension K>^ corresponding 
to a particular value of the Householder generator P.* 

2. The value of the B-average interclass Bhattacharyya Distance 
Rg(i,j), i = 1,...,H-1; j = 1+1,... j-lsT 

3. The N-diraensional F-vector which is the generator of the House- 

T 

holder transformation H - I-^F used in constructing the B-matrix 
B = ClJz)H. ^ 

^ 4. The value of the B-average Bhattacharyya Distance, Eg corresponding 
to the matrix B. 

5. The partial derivative vector which contains the partial 

derivatives of with respect to the vector F. 

*See 'ALGORITHM’ 


M(input) 

N (input) 

K(input) 

IXE( input) 
ALPHA(input) 
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Subroutines Used 

The following subroutines are in turn called, by ..subroutine BHATBl: 

1. Subroutine MATMUL - calls SDPST3M' and ORDER. 

2. Subroutine GHLSK?. 

3. Subroutine DET. 

ALGORITHM 

Subroutine BHATBl attempts to compute the minimum B-average Bhattacharyya 
Distance using one Householder transformation to compute the B-matrix. The 
B-average Bhattacharyya Distance is given by the formula 


m-1 m 
1=1 j=i+i 


where 


H. 


A _i. 


gCi.j) - e*p[- + z'j) -"ay - i tods. + 1^/^] 


A T 

where o.. = B(u. - u.) and Z. = BZ.B and B is a I5 <k matrix of rank K 
3-3 ^“^i 1 i 

of the form B = Cl k 1«“ where H = ^ ll^ll ” An. initial guess for 

T 1 IT 

F is taken to be ..•3 -^] and the corresponding matrix 

B = (Ij^l Z) (I-2F^F^^) is computed. The corresponding value of 


- m^l m 

^ .2ZH3a,j) 

i=l 3=1+1 


is also computed. 
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The steepest descent iterator is then applied to alter the value of F 

. 3^8 ■ 

^p+1 “ ^p “ ® 3Fp * ’ 

where a is a varying parameter and is one of the inputs to the program, 

9Hg 

is the partial derivative vector (derived analytically). The value of 
p 

' it ‘I 

F , - is then normaliaed so that UF , , i = 1. The B-matrix is recomputed with 
P+1 ll P+l.l 

the new value of F. The corresponding value of is computed. This procedure 

is repeated (ITE - 1) number of times (8 seems to be a good value for ITE) . 

Two points should be. noted: 

8 H 

(1) . Whether 0 . 

(2) . Whether ^ ” BDIST (the total Bhattacharyya Distance) is 

I 

sufficiently small. 

The values of a and ITE (which are both inputs to this subroutine) 

should be altered accordingly in order to achieve the above 2 objectives. 

The value of, F at which the minimum value of occurs is saved. Call 

— ! 

it FI. . 

3» Subroutine BHATB2 

This subroutine attempts to compute the minimum B-average Bhattacharyya 
Distance using 2 Householder transformations. 

USER* S INFORMATION : 

(Double Precision Version) 

(1) In order to use this subroutine the following FORTRAN calling 
sequence must be given; 
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< . 

CALL BHATB2CC0VAR, XMEAN, M, N, K, IXE, ALPHA) 

« 

where 

GOVARj.XMEAH, M,N,K, ITS, ALPHA 

have the same meanings as in SHBROUXINE BHATBl, 

(2) This subroutine reads in the value of FI computed in the previous 
program (subroutine BHATBl) , The data cards for FI should have 
the format 5F16.8 (e.g. if FI is 12-dimensional then FI is 

punched on 3 data cards; the first 2 cards contain 5 components 
of FI and the last card contains 2 components of FI). 

These data cards for FI are placed following the data cards for the 
covariance matrices and the mean vectors. 

(3) The value of FI that is read in is then used to compute the 

T 

Householder transformation H^ = I - 2F1F1 . The covariance matrices 
2^ and the mean vectors i = 1, ...,m are transformed into 

• and 

The number of Householder transfomiations by which the covariance matrices 
and the mean vectors have to be transformed is denoted by the variable 

IJ, 

For subroutine BHATB2 we require one Householder transformation to obtain 



8 

The FORTRAN statements "IJ = i" appears after the comment; 

”C IJ Eij. No. of Householder Transformations Required 

OUTPUT OF SUBRODTINE BHATB2 _ . ’ 

1. The vector FI. which is the generator of the Householder transfor- 

T 

xnation = I - 2E1F1 • 

2, Same as subroutine BHATBl. 

ALGORITHM : 

Here each E. is replaced by H S .H_ and each y. is replaced by H-u. . 

The B matrix is then taken to be B = (I^j Z) (I-2FF^) , F = 1. An initial 
Til 

guess for F, F^ = Cpjj^**** is made and the same procedure as in subroutine 

BHATBl is applied. The value of F =* F2 at which the minimum value of 
occurs is saved. 

USING MORE THAN 2 HOUSEHOLDER TRANSFORMATIONS TO CONSTRUCT THE ' B-MATRIX; 

If more than 2 Householder transformations are required to compute the 
transformation matrix B i.e. if 6„ *= H_ - BDIST is not small enough, then 

XI B 

subroutine BHATB2 can be modified in the following way. For the B-matrix 
requiring 3 Householder transformations do the following; 

, (1) Place the data cards containing the vector F2 (computed in the 
previous program) following the data cards containing FL. 

(2) The statement following the consment "C... Eq. NO. OF HOUSE- 
HOLDER TRANSFORMATIONS REQUIRED ..." should be "IJ = 2" 


For J > 4 Hous^older transformations required in computing the B-raatrix: 



Cl) the data cards for El,.., ,PCd“l) ‘ should be pla^ced after the data 
cards for the covariance matrices and mean vectors; 

(2) the statement "IJ = 2" should be changed to "IJ « 


References 

1. H.P. Pecell, Jr. and W.6. Smiley, III, "Householder Transformations and 
Optimal Linear Combinations", Dept, of Mathematics, University of Houston. 

2. Salma K. Marani, Masters Thesis, "Bliattacharya Dist^ce, Householder Trans 
formations and Dimension Reduction in Pattern Recognition". 
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I. imovucrm • 


This p/LogAcon reacts muIstUpzctaZ 6cannz^ dcuta. <J/Lom a UnlveA&aL ^oftinat 
tapz and oivCputA an AMteJunexUixjtz data 6zt Zn cjaftd magz iomat {^ofi a-6e oA an 
tnput data 6et tn vaAtoiis, data amZy^t& de.V 2 topmeMt pfLogfum&» The. gmeJuxZ. 
aapabULitteA a/tz &tmnaJttzed a& ioZtam •* 


1 ) decodz tkz keade/t. fizzond oi thz unlve/uaZ iofanat tapz, 

Z) extftact alt on. pant oi tkz channels on thz uyitveMot ionmt tapz, 
[Thz zhannzt nianbe/u aAz netatCvz ] . 

3) extnact a fLzctanQatan. azgion dzitned bt/ iZ/ut tZnz (X START), 
tost tZnz [TSTpPIf and a tZnz ihZp iacton. [ISKIPj and anatogovu, 
zolma on. pZxeZ vatazs JSTART, JSTOP, Ah!D jfSKTP, (tSfCTP on. 

JSKTP = X, means '«£_ tZnzi aJiz ■skipped* J 

4) extnact and tabet any n.zgZon dziZned by a non-n.zctangutan iZztd 
on. iZeZds ujhZzh Zs a subnzgZon oi 

5) nayidomty setejct a pe/izzntagz SAMPCT oi thz nzgZons on , uthZzh 
uJenz dziZned Zn 3 on 4. 


II . 114PUT PARAMETERS. J 


SAMKE/ -J 
0 
1 

SAMPCT 

SEE!? 

JSTART 

JSTOP 

TSKIP 

JSTART 

JSTOP 

JSfCIP 


~onty headeJL nezond Zs decoded 
-detenmZnZstZc sample Zs zxtnacted 
-nandom sample Zs zxtnacted 

~Zi SAMKBV = 7 , pencent oi data to bz nandomly sampled 
-J-i SAMKE/ = If ZnZtZal seed ion nandom nmbzn genznaton. 

[must bz a posZtOjz odd Zntegen] 

-beginning line ion sample [absolute line nvtmben] 

-Ixist line ion sample 

-line skip iacton (Zi TSKJP ~ 1, no lines one skipped] 
-beginning pixel ion sample [nelatZoz pixel mmben] 

-last pixel ion sample 

-pixel skip iacton [Zi JSfCIP ^ 7 , no pixels one skipped] 


mmuT 

NCHLST 

NHVS 


Fit? 

Wl/ 


MIWLTW 

MAXLIM 

TFU,t} 


JF(J,T) 


-nunibeA. ahamoJU to be. output 
-oA/tatf OjJ Aetatwe. cfiameZ numbe/u ^fCH0UT afmnnet^ 
to b& output 

'•monbeA ol non-AectangutoA ^teZcU to be. de^tned (-t<f 
WFLPS = 0, then the AectanguZoA AegXon de^tned by 
JSTART etc, t6 output) 

-ojoiay cont^fUug B civaJtuicteA {^teM TO (^qa ea.ck iteZd 
-aAAmf eontcUnti^ numbeA ojJ veAttcei ^oa eacJi non- 
AectanguZoA it<M {a 4 the ^teZd t& a yuxidfwiZateAat, 
then nV - 4} 

-oAAay contatyUng the intniirium Zine nwnbeA ^oa eaah ^teid 
-oAJiay aontcu.yu.ng the maxtmum tine numbeA ^oa each IteZd 
-tujo cUmenstonaZ oAftay aontaZnlng the tine cooAdZnatei> o^ 
the 3tk veAtex o^ the Tth iZeZd ioA J * 7, . . 

(the itAit aooAdZnate Xt Aepeated o6 the UV*1 aooAdZnate 
atamiPS) 

-a two cUmenstonaZ oAAay aontaZrUng the pXxeZ aooAdtmte& 
0 ^ the Jth veAtex. o^ the Tth ^Ze^ ^oa J = 7, . » 

WI/+7 the itA&t cooAdiyiate Z& Aepeated o6 the WV+7 ao- 
OAjdZnate a Za FRITS) 

[the above veAtlceA must be gtven Zn sequence such that 
the ZnteAZoA the ileZd ties to the Atght. See 
Appendix A ^oA the ERIPS documentatZon ioA the TVLfJtH 
AoatCne) 
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IV, IWPUr FORA!AT TOR PARAMETERS 


REQ: 

SAMm 

\[J0X, 110} 



ISTART 




ISTQPV' 



REQ: 

ISKIP 

JSTART 

lISTdP 

JSKIP 

[10X, IIP} V. 


OPT: 

SMPCT 

[10X, FI 0,0} 



SEEP ' V ■ 

,.[10X, 110] 


REQ,: 

NCHOUT 

' [iox:;^iio) 



NCHLST 

[10X,T6Ul^ 


REQ.: 

NHVS 

[1 ox, 110 ] 



ioA I = 1, , 

,,,, HELPS Ui NFLVS 0} 



FTP [II 



OFT: 

Nvni 

MIWLIMCn 

MAXLIWflf 

[AS, 2X, 315] 



IF[J,n 

[1115) 



JF[J,II 

[1115] 


V. TORMAT OT INPUT DATA SET 


The. Input Data. Set: Xm Ae/id Tofttfuin unXt 1 [TT01T001] by the READ 

Aoutine, 

The Input Vata Set /ut& the ^oAimt o^ a 

UnXveAsaZ FoAmat Image Vata 

Tape de&cJiAhed tn NASA EaJitk Ra5ou/LC.e5 Data. TdAmat ContAoZ Book , {TR-543) , 

VI, FORMAT OF OUTPUT DATA SET 



Foa 20c.h fJCff cLbn&tv&^nctJL pZ3teJL {X(I}, 1=1, . . • , fJCH) 6e££.de.d 

^oA output, the. io-ZZoMing Aexiond {SO byt&&] t& wutten onto ToAtAan untt 

3 (TT03K31]. 

LIME numbeJi . } t 

PIXEL NUMBER ' 

TW not uppticabZt' ti oiAttt^n] 

X(NCHLSr,(W - 

XIWGRLST; (Z)] 


X(WGf/lST {WGffOtfT) ) 






Th& Z& ( 214 , AS, 1614 ], Tkz logZc.aZ fi&aofLd Zmgth JU> SO hytiUt and 

the, BLKSTZE Xs dete/utUned Bcf the. JCL ca/id de-^XtUrig Vofitfvan txnXJt 3 (FT03F007) 


m, SUBROUTINES 

MIX ~a/LHang&& data hg pXxeZ natke/i. than bg ahanneZ 

RANVU -Aandom nunibe/r gmefiato/i flBM SSPI 

EVIMTN -detejim^&s Xnte/u>ectXon o{ a mn-tizctanguZcUL ^XZeA 

a 6 can ZXne., [EoAtyuin ve/t^Xon ?L1 ERTP5 utiJUtg fioiitlne.] 
REAP '■a&4>embZg Zanguage. [3$0 OS] bXnaftg fimd tiouZXne. {HXman) 
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Ammx A 


-te VEL 21.8 -t~ 5 0M“74“ 


GS/360 FC STRAW 


COMP 


ISVJ 0002- 
ISNJ 0003 

IS'J 0004 
IS'i 0005- 
ISM 0006 
ISM 0007 

ISM OOC8- 
ISN 0009 
ISM 0010 
ISM 0011 


ISM 0012 
ISM 0013 
ISM 0015 
ISM 0016 
ISM 0017 

ISM 0018 
ISN 0019 
ISM 0020 

ISM 0021 
ISM 0022 
ISM 0023 

ISM 0024 
ISM 0025 
ISM 0026 

ISN 0027 


I OPTIONS - NAHE= ^,aPT=02,L INECNT~50»SI 2F=OOOOK . 

SUURCB, EBCDIC » NCL 1ST ,NCDECK, lCAD»KAPf NOEC IT» NO I OffNOXREF 

INTEGER SEED - • 

INTEGER BEGVlDJtRECLNG,RECBND,ANCLNG, INDX U6-) iXXXXt 2500) » 

* ONEiSAMKEYtSAMSI/jKCHLSmo) 

LOGICAL^ Z(3060) ,Z2(2) ,xa0000)iaUTU6) 

- INTFGER*2 ZI NT2 , NREC# UN ,XXI5000) 

DOUBLE PRECISION OVERfBLA^Kf CXXX,FIC 

DIMENSION FID (50J»NV(50) ,MINLIM50) ,MAXLINIS0Ji IFi 12»50), 

* JF {12,501 .INTQli tOVERUOOOI 

data blank/' •/ 

DATA OXXX/' S$$<*$$$'/ 

DATA CLT/16#* •/ tSAMS lZ/0/ ,L IN /O/ 

EQUIVALENCE ( 21.NT2 ,Z2 (1 ) ) , {NPEC,Z ( 1 ) ) , { L IN ,Z(7U}, 

* (X(i) ,XX(li), (Xill .XXXXMJ) 

READ HEADER RECORD AND DECODE THE FOLLOWING VARIABLES 


BAND 

T RECORD 


NCH — “- NUMBER OF CHANNELS 

NCHl - NUMBER OF CHANNELS ON FIRST RECORD. OF BAND 

NCH2 - NUMBER OF CHANNELS ON OTHER RECORDS OF BAND 

RECLNG - RECORD LENGTH 

— RECBNO “ ' -NUMBER OF RECORDS PER- BAND 

SE PJ^ELS PER CHANNEL /PER B 
ANCLNG - LENGTH OF ANCILLARY BLCCK GN FIRST 
BBGViO - BEGIN VIDEO BYTE WITHIN SCAN 

INDX ARRAY-OF -lNCrC-ieS-F0R-8E6T-NNI-N0-B7 

IWITHIN TH AFPRCPRIATE RECORD 

CALL READ(Z,LRCLGi 

IF ( LRC LG. LT. Ot-G<l-TO-999 

ZINT2 = 0 

Z2(2) = 2I90) 

n:h=zint2 ^ 

Z2(I) = 2(92) 

Z2 (2) =--2 (93) 

BEGVID = ZINT2 0 50 

^ 

Z2(1)=Z(96) 

22 ( 2) =2 (97) feS 

_NP I X= Z I NT2 ge 

22 ( 1 » = 2 U 00) £ p ~~ 

Z2{2)=Z(101) 

RE CLNG = ZI NT2 g ^ 

2INT2=0 ™ 


BAND 


O 50 
Q 50 

S8 


B § 


V 




Book: Program Documentation 


Large Area Crop Inventory Experiment (LACIE 

3. imPFLI-ICAXPFLI 
Date9/ii/75 
Rev 
Page i 


IIAXPFLI-ICAXPFLI 


REFEREHCES 

1. Program Name - FDLNIKT 

2. Programmer - R. J. Decker 

3. Language - PL/1 

I*. LIHKEDIT Attributes - NCAL 

5. Inputs - Scan Line Number 

6. Outputs - Intercepts (pixel numbers) of 

7. Special Items - Calling sequence: 

CALL FI;LWIHT(P,L); 
where P = pointer to field definition table 
L = 11 element vector declared 
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scan line and field sides 


FIXED BIN (15) 

L(11) should be loaded with the scan line munber 


On return, th e L vector will contain the, ordered pixel intercepts, (e.g., 
a return of | 5 I 7 I 12 | 20 | 0 ^ 


0 i indicates pixels 5 


through 7 and pixels 12 through 20 are contained in the field. ) 


FUNCTIONAL DESCRIPTION 

This subroutine will return the pixel numbers of those pixels on a given line that 
are contained within the boundaries of a field. ' 

DETAILED LOGIC DESCRIPTION 

IIAXPFLI examines the number of vertices of the input field to determine if the 
field is a line-field or a polygon. If the input field is a line-field, then 
the intercepts > re determined as follows : 

The intercept of the line-field and L-0.5 is calculated as P = 

(L-0.5-Y^) j (y.,-Y^) • 'fliis calculation determines the projection of the 

intercept. of the line-field mid L+0. 5 is calculated as P = (X^-X^) (L+0.5-7^) 
j (l 2 “Y^) + X^. This calculation determines the projection of the intercept 

of L+0. 5 onto L. These projections are examined to determine which is the 

left one (P_ ) and which is. the right one (P_).‘ P- is set to the integral 
L n L 

value of P-+0.5 end P„ is set to the integral value of P_ + 0.^999 • 

Xi n Jt\ 


Approval 

Approval 

vM/r 
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If the field is a polygon, then IIAXPFLI finds the pisel intercepts of a scan line 
and the sides of the input field. 

! ■ There are three distinct cases and each is handled separately; (1) the scan line 

•- intersects a side hut not at the endpoints (i.e. , vertices) , (2) the scan line 

I intersects a vertex that is not an end of a horizontal line, and ( 3 ) the scan line 

,! is concurrent with a horizontal side of the field. 

FUNCTIQHAl FLOWCHART 

■ ■ See Figure 1. 
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FIELD 


. «LEPT TO lUCHte 




RETURN j 


; ^ RETURN 




H&h 
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* sa? 


« 4 ND 

VERTICES ED 2 • 
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* SlDcS^ihO » 
« VERTSi • 
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•IID 
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RETURN ( 

* 99 « 9 « 9 ««M 99 *P 


**«*«**«*««9*«4 

: : 

*iHU5T HAVE 21* 

9 0 


... .. 

• • 

•* 1 NU 

viSrWse 

a * • Qi 9 

VES1 * * 

♦ **• 




*..A* * 

I * 0 

» 99 « 9 *«*« 04 k« 44*9 

: 

•P* * 


9MMM?. 


4 * 149 «» 1 « 9 * 4 * 

V 9 

• » y 

« DO* 

* 9 

4»W 


LiikkgSUJco > ! 

• . . * ,!!. 

• 9 9 Ca* 

NO 9 « ** « 

,*i* CD* ••• 





‘ i 


ISM Qp28 
ISN Op 29 

ISM 0030 
ISN 0031 
ISM 0032 

ISN 0033 
ISM 0034 
ISM 0035 

ISM 0036 
ISM 0037 
ISM 0038 




ISM 0039 
ISM 0040 
ISN 0041 
ISM 0042 
ISM 0043 
ISM 0045 
ISM 0046 
ISM 0047- 
ISM 0048 
ISM 0049 
ISM 0050 
ISM 005t" 
ISM 0052 


ISM 0053 
ISM 0054 


-X- 


Z2r2)=ZiiQ2> 

N:H2=Z!NT2 

ZINT2=0 
Z2(2)=2ii04) 
RECBND = ZINT2 

Z2in=m05J 
Z2I2J-Za06) 
ANELNG = 21NT2 


22 111 = 2(17851 
Z2 (2) =2(1786) 
NCHl=ZINT2 


i:t=o 

OQ 20 l=i,NCHl 
ICT=ICT-H 

-20 INOXd )=ANCLN6rf-2+< I-i J MtNPIX+ l 

IF {RECBNO.EQ. U GO TO 40 ■ 

00 30 I= 2 fREC 8 HD 
DO 30 J= 1 ,NCH 2 

— ^ IGT=ICT+1 -- - - — ^ 

30 IMDXIICT)=2+U^1KNPIX+1 - , 

40 WR 1 TE ( 6 ,2 001 NC H » NPI X , REC LNG,NCH1, NCfriZ, RECBWD, ANC'.NG* BE GV ID 


WRI TE (6,2011 

WRITE (6,2021 

200 FORMATdHl 

« ■ ■ - • 

"" 


C 

-c- 

c 

c 

c 

C- 

c 

c 

c 

-C" 


201 

202 


(I ,tNDX(Il,I=l,NCHl 
Z 

NCH 
NPIX 
RECLNG 
NCHl 
NCH2 
RECBND 
ANCLNG 

BEGVID 

FORMAT SIH , *I NDX (» ,12 ,• 1 = 
FORMATUOO(/,5(2X,1QZ21 IS 


= I 




• ,16,/ 
,16,/, 
,16 ,/, 

, 16 ,/ j 
,16,/, 
fI6,/, 

;ilA 


,IQ) 




READ SAMPLING PARAMETERS 


SAHKEV 


SAHPCT 
SEEO- 


l START 


- C^LY HEADER-REC0R^^-IS-1}EC0DED- — 

- DETERMINISTIC SAMPLE 

- RANDCH SAMPLE 

- PERCENTEGE CF DETA TO BE SAMPLEC RANDOMLY 

- SEED FOR- RANOCK “NUMBER- -GENERATOR 

- BEGIA LI^E FOB SAMPLE (ABSOLUTE LIME NUMBER:! 


..-- 4 ,, 




X 


ISTOP 

1 S K I P 

JSTART 
JSTCP 
JSKI P 
NCHCUT 
NCHLST 


ISKIPaOf NC LINES A 
BEGIN PIXEL FCB SAMPLE- IREL AT iV-E-PIXEL-N 
LAST PIXEL FOP SAMPLE „ „ 

PIXEL SKIP FACTOR (IF JSKI.P=0, NO PIXELS 
NUMBER OF CHAANELS TO BE OUTPUT 
APRA Y CF CH ANNEL -1DS--TE> BE OOT-PUTH-REtAT 


ARE SKIPPED) 
NUMBER:)-" 


SKIPPED) 


0055 

0056 

0057 

0058 

0059 

0060 
0061 
0062 
0063 
006A 
0065 
0066- 

0067 

0068 

0069 

0070 

0071 

0072 


ISN 0073 
ISN 0074 
ISN 0075 
ISN 0076 
ISN 0077- 
IGN 0078 
ISN 0079 
ISN 0080 
ISN 0081- 
ISN 0082 
ISN 0083 
ISN 0084 
ISN-0085- 
ISN 0086 


READ (5, 1000) SAMKEV 
WRITE (6tI007) SAPKEY 
IF { SAMKEYJ Al t42 i42 

41 STOP 

A2 REAn(5,I000) I START ,I STOP ,ISKI F» JST APT, JSTOP, JSK IP 
WRITE (6,1008) ISTART, ISTOP ,ISKIP, JSTART, JSTOP,JSKIP 
IFISAMKEY) 44^44,43 

A3 REAO(5,I002) SAMPCT,SEED 

IX=SEED 

WRI TE(6,1009) SAMPCT,SEED 
SAMPCT = SAMPCT/iOO. 

AA READ (5,1000) NCHCUT , . , 

READIS, 1003) (NCHLST(I) ,1-1, KChOUT) . 


ICOO FORMAT! 10 X 
1C02 FORMATdOX 
1C 03 FORMAT! 10 X 
1QC7 FORMAKIHI 
1008 FORMAT (IH 


1 1 01 

FlQ-0,/,10X,n0) 


-1CC9 FORMAT! “ SAMF 
1010 FORMAT I ‘ NCHC 
ICll FORMAT!* NCHL 
REAl)(5,2000) 
WR I TE ! 6 ,2001) 


X,16I2) 
l,*SANKEY 
,'I START 

* I STOP 

* ISKIP ” 

* JSTART 

* JSTCP 
» JSKIP 

SAMPCT = 
NCHOUT = 
NCHLST 
DO) NFLOS 
001) KFLDS 


= *,110) 

= '.IlO,/, 

■•= *,uo,/, 

• , 

= • ,110,/, 

= « ,110 ,/ , 

= * , 1 10 ) 

• ,F10.2/,-*— SEED =- 

»,I10) 

',1615) 


2000 FORMAT (lOX, no) 

2001 FORMAT UH , 'NFLOS = ' ,110) 

IF !NP LDS) 440 ,44 0,438 

DQ 43^ NF“1*NFLDS 

RFAD(5,2002) F 1 0 ( NF) ,NV( ^F) ,M 1NLIN( KF) , MAXL IN! NF i 
NVS=NV(NF) I ^ 

RE AD! 5, 2003) (1 F ( J ,NF ) , J= I ,NVS ) 

RE A D ( 5 ,2003 ) - ( JF ( J , NF ) , J=1 , N VS ) 

DO 6C5 U=l,N\iS 








00£7 

ISX) 

00 88 

- ISN 

0089' 

ISV 

0090 

IS') 

0091 

ISM 

0092 

-ISM 

0093 

ISM 

0094 

ISM 

0095 

ISM 

0096 

- ISM 

0097 

ISM 

0098 

ISM 

0099 

ISN 

0100 

— ISM- 

0101 

ISM 

0102 

ISM 

0 103 

IS.M 

0104 

ISM 

0105 

ISN- 

-0106- 

ISM 

0107 

ISM 

0108 

ISM 

0109 

iSIM 

0111 

ISM 

0112 

iSM 

0115 

ISM 

0115 

ISM 

0116 

ISN 

0117 

ISM 

0118 

- ISM 

0120- 

ISM 

0122 

-ISM 

0123- 

ISM 

0125 


JfNVS-IlM 

IF( Ji F{ ■ — 

605 JF { J1,NF):=JF{ J,NF) 

IF < l,NFi=IF <NV5 »^F^ 
JFI1,NFJ=JF(N\S ,^F) 

IF(MVS + 2♦NFJ = IFl3,^F) 

JF (MVS+2fNFJ = JF(3 ,^F) 

NV3=NV£+2 
V^RITE (6,2004) ^F 

“• WRI TE <6 ,2005) FfD( ^FJ ,NV( ^F) 

KRITe<6 ,2006) IIFUtNF) ,J=1,NV3) 
.439 kRI TE (6,2007) .( JF ( NF i , J~l , \V3) 
2C02 FORMAT (48 ,2 X, 315) 

-2003 FORMAT (11 1 5 1 

2C04FDRMAT(5X, 'FIELD = »,I10) 

2CC5 F0RMAT(5X, 'FIELD IC - •*',A8,«*»»t 

* 5X,'NV - • ,110,/, 

*- 5X,*«I NtrN- = - ' ,110,/-#- 

* 5X, 'MAXLIN = “,110) 

2006 F0RMAT(5X, 'LINE = ',1215) 

2CC7 F0RMAT(5X,»PIXEL = •,1215) 

-440'CONTiNLE - — — 


“ ,110) 

• ,1215) 

• ,1215) 


C _ 

c 

-C—hRlTE 


DATA I MTO £GB FCRMAT— 


-SAMSIZ = 0 

CALL READ( 2,LRCLG) 

IF (LRCLG.LT-0) GC TO 
IF(NREC-I) 55 ,55,60 


LIME = LIN 

IF (LINE.GT.ISTCPJ ( 
LS=LINE-ISTART 
WRITE (6,307) LINE 
F0RMAT(20X,I1Q) 
IF(LS.GEoO) GO TC f 
IF (REC8ND.LE. 1:) GO 


550 CALL READ! ZtLRCLG) 

— — lr-( tRC-iGi-t Tw 0 ) GG 70-999* 

IFiNREC“lJ 55 ^55,550 



J 


ISM 

0126 

ISM 

0127 - 

ISN 

0129 

ISM 

0130__ 

ISM 

0131 

ISM 

0132 

ISM 

0133 

-ISM 

0134 — 

ISM 

0135 

ISM 

0136 

ISM 

0137 

ISM 

0139 — 

ISM 

0140 

... jsvj 

0141 - 

ISM 

0142 

ISN 

0143 

iSN_, 

0 144_ 

ISN 

0145 

ISM 

0147 

ISM 

0148 

ISM 

0149- 

ISM 

0150 

ISM 

0151 

ISM 

0152 


ISM 0153 
ISM 0154 
ISN 0155 
iSM 0156 
ISM 0158 
ISM 015S) 
ISM 0160 
ISM-0 161“ 
I SM 0162 


552 LSM=L£/ISKI P*ISKIP-LS 

-- IFlLSM.NEfOi GO TO -550 

555 DO 56 1-1,2500 

_ 56 XXXX(I )=0 _ ■ 

KRE=C = 1 
NCT-0 

DO 57 l=l,NCHl - 

-NC T=NC T+1 - ■ — 

IND = r^DXt^JCT) 

57 CALL MIXUIlNDJyKCTfNPIXfXiNCH) 

IF(NCH2.EQ.0J GO TC 7329 

^ €Q ■ TQ^ 50 — ^ ^ 

60 KRBC=:KREC*1 

DO 61 m,NCH2 ; 

NCT=NCT+1 

IND=INDX(NCTJ 

6 1 CALL MIX( Z(I ND) ,KCT,NPIX,X, NCH) . 

IFCKREC-LT.RECBNOJ GO TO 50 

hRITE DATA TO OUTPUT CATA SET _ 

7325 CDNTINLE 

IF {NFLDS) 675,675,659 

-655 DO 660 iP=l,NPIX — — 

. 660 OVER{IP)=DXXX 

DO 665 ^F-l ,NFLDS 

CALL FDLNIIVtUNE jNVINFJjIFUaNFJ.JFI l,NFl , INT,M IM IM (MF ) , 
* RAXtl MNF) J — — 

’ hR I T£ (6,6660) LINE,NF,INT 

6660 FORMAT (30X, 21 10 ,U 15) 

DO 660 IM=1,5 
K=INT(2#IM*-1) 

KK=INT (2#IM) 

IF (K«EC.Oj GO lTe-670 — ^ 

DO 669 JK=K,KK 
665 GVER( JK)=FI0(NF^ 

670 CONTI NLE 

—6 65-eO N Ti N LS — — 

665 CONTINUE 


1 


1 


ISN 0163- 
ISN 0164 

ISM 0165 

ISM 0167 


—675 


ISM 

ISN 

ISN 

ISN 

ISM 

0169 

0170 

0171 

0172 

0173 

680 < 
7C 

ISM 

0175 

u 

75 

ISN 

0176 

78 



C 

ISN 

0177 — 



ISN 

0179 


ISM 

0180 

3C0 

ISM 

0 181 


ISM 

0182 

—- 80 ( 



C 



c 

ISM 

0183 


ISM 

0184 

301 

ISM 

0185 




c 

ISM 

*T 

0186 

999 

ISM 

0187 


ISM 

0188 

90 

ISM 

0189 

. 400 

ISN 

0190 


ISM 

0191 -T- 

405 1 

ISM 

0192 


ISM 

0193 


ISN 

ISM- 

0194 

0195 




CONTINUE - — — 

DO 80 I=JSTARTf JSTCPi JSKI P 

IF{NFtDS.Le.O) 6C TO 680 

IF(QVEB(IJ.EQ*DXXX) GC TO 80 
CONTINUE ■ ■ 

IFiSAHKEyj 75 ,75,70 - 

CALL RANOU(IX,IY,YFU 
IX = IY 

JtFIYFL.GT.SAMFCT) GO TO £0_ . 

00 78 J=1,NCH0UT 

OUT<J)=xni 3-n*NCH + NCHLSTCJ)) 

IF ( NF L D S . LE . 0 ) 0 V E R (I ) = BLANK ^ ^ 

WRITE (3,300J LINE,1,0VER<I) , (OUT( J ) , J=l, NCHOUT) 
FORMAT (2I4.A8 ,16I4J 

SAMSI2 = SAHSIZ + 1 

CONTINUE ^ 

WRITE (6,301) LINEtNREC 

FORMAT (2X,2I5 ) — " — 

GO TO SO 

0NE=-1 

DO 90 1=1 il 00 ^ 

WRITE (3,400) CKE 
F0RMAT(I4,76X) 

WRITE (6,405) SAM SI 2 

FORMAT! f- SAMSr.ZE—^*-i MOi 

ENDFILE 3 
REWIND 3 
STOP 

■END — — ^ 





LEVEL 21.8 L'UUM 14 ) 


05/360 “ F08TRAN H 


“i>A'TE 







t-.,-,,.; uSSaS'- 


LEVcl 21.8 


CS/360 


■QRTRAN 


COMPILER OP n MS - 


0002- 

0003 

0004 

0005 
0006- 

0007 

0008 
0009 


GPIJMS - NAHEj MAU,OPT=O2,LIM£CNT = 50iSI2E=Q0OOK,_ 

SOURCEbEBCCICjNQL 1ST, NCOECK, LOAD, MAP* NOEClTjNOID,NQXRE^ 

- SOP R 0 «J TI NE R A ND U (I X , 1 Y * YFt ) 

IY=IX^65539 
IF tl Yi 5,6,6 

5 IY=IY-i-2147483647 * I ■ 

4 YF L =1 Y • - - -- : ^ 

VF L= VF 4656 613E-9 

RETURN 

END 



■Ci'F^ It, 


LEVEL 2l,« 


0002 
OJ03 
0004 
0 305 
0306 
00Q7 
0000 
0010 
0011 
0012 

0013 

0014 

0016 
0 3X7 

0018 

0019 

0020 
0022 

0023 

0024 


0026 

0027 

0028 

0029 

0030 

0031 
0033 
0035 
0037 
0039 

0041 

0042 

0043 

0045 

0046 

0047 

0048 
0 049 - 
0 0 SI 

0052 

0053 

0055 

0056 


) r jUN-?4 i CS/360 FGSTRAN H- 

COMPILER CPTIONS - NAME= MAUiOPT=0 2 tL INECNT = 5QiS I2 E=OOOOKt 

SOURCEiEBCCIC ,NOL 1ST vNCDECK, LCAD,MAP,NOECIT,NOI 
i SUPROUTINE FDLNIN (L, NV.Y ,X, i^Tv^iINLI^'fMAXL INJ 

> INTEGER Y(12) ,XU2)tIl^iTtlli»CU^ 

e REAL PTS(IO) 

5 NVi=N\fvl 

■> — QO j^o 1=1,10 — 

T 1C INT{n=Q 

i IF { L-LT.MINLI N*CR. L-GT.PAXLIM RETURN 

) 1 DO IS 1=1 ilO 

L 15 PTSU J =C, ' ~ - 

1 IPT=0 

3 DO 12 1=2, NVl 

i IF{.NCT*CL-GT.MINO{Y{I) jY U + 1) )*AND.L*LT.MAXO(Yt n,Y(I + 

— GO .TO 12 - — 

> IPT=IPT+l ^ 

r PTSaPT) = (FLQATUL-Yn))*(X( H-D-XC nj))/ 

>!« (FL0AT(Y,U+1)-Y(D) I^-FLOATIXIDJ 

1 1-2 CONTI NLE ... , 

3 DO 14 I=2»HV1 

) IFI.NGT.IL.EQ.YCI) cAND.L.NE.Y{I-l).ANC-L-NE*YI I*ll))6D 

2 IPT=IPT+1 

J — ' — ^ - PTS(IP7J=FLCAT.tX(I)) - - •* - - 

i IFt-NQTo{£L,LT«Ya-I).AND«L.LT.Y(l + l))-0R-{L-6T,Y(I-U- 

« L.GT.YtH-liniGO TO 14 

j IPT=IPT<-1 

7 ~ ~ PTSd PT)=PTS(IPT-1) 

3 14 CONTINLE 

? J=1 

3 50 J=J+l 

L-— IFCJ.GT-NV) GO TO 100 - 


DvNQXREF 


DM) 


TO 14 


AND. 


IF (VI JJ.NE. L) CC TC 50 
IF(Y( J+D.NE.U) GC TO 50 
IF (X{ J-J-IJ ,LT. X(JM GO TC 
IF(Y( J-l),GE. U GG TO 20- 
IPT=IP1-H 
PTS(IPT) = XU) 

IF IYU+2).GE. LJ GO TO 

"IP7=IPT-H 

PTS(IPt)=XTJ+l)L 
J=J+l 
GO TO 50 

- IF ( Y{ J-i) . LE. LJ -GG-TO 
IPT=I PT+l 
PTSUPTJ=X«J) 
IF(YtJ+2J.LE. U GO TO 

- IP T=i PT-i-l ~ _ 

PTSaPT}=XU<-M 


hH O 


i 


1 




^ ISN 00 57 
!SN 0058 
ISN 0 059 
1 ISNJ 0060 
ISN 0061 
i IS'j 0 062 
IS\ 00 63 
ISM 0 064 
ISM 0066 
ISM 0067 
ISM 0068 
ISN 0069 
ISN 0070 
ISN 0072 
ISM 0073 
ISM 0074 
L ISN 0C76 
' j ISN 0077 
: ISN 0078 
ISM 0C79 
ISN 0080 
ISM 0081 
ISM 0083 
ISM 0084 
IS.M 0085 
ISM 0086 
ISM 0088 
ISN 0089 
ISN 0090 
ISN 0091 
ISM 0092 
ISM 0093 
ISM 0095 
ISM OOS6 



f 


iISM 0097 - 
jISM G0S8 ‘ 
ISN 0099 
ISM 0100 
;ISM 0101' 
iSM 0102 
ISM 0104 
ISM 0105 
ISM 0106- 
ISM 0107 
ISN 0108 
|lSN 0109 
IlSN'OllO— 




J-J+l 
GO TO 50 

CUNTINte -- ■ 

IPTl=IPT-i 
DO 30 K=1 tIPT1 
K1 = K-H 

DO 30 I=Klr!PT - — - 

IF (PTSUJ -GE. PrrSIK)) GC TC 30 

DUM=nsnj 

PTSnj=PTS(K) 

PTS (KJ =DUM ■ - — 

CONTINLE 

IF(IPT.EQ,2i GC TO i03 
IPT2=IPT-2 

DO 40 I=2jIPr2t2 

IF(PTS^^.^E. PTsn+in gc to 40 

PTSUI=-1 - 

PTS(I-M)=-1 

CON TI N IE - ^ ^ 

K=0 

DO UO I-ltIPT^2 
IF IPTS(IJ.EG.-.1} GC TC lOS 

K=K.+ 1 ■ -■ 

INT(KJ>PTSnj+-499 

CONTINLE 

IF(PTS(I+l).ECl.-l) GO TC 110 

K=K+i ' ■ ^ ^ ^ 

INT(KJ=PTSn-Hl * .500 

CONTINLE 

IPT2 = IPT-Z 

DO 60 I=2iIPT2V2— — — 

IF CINTUl. NE. INTII+1) } GC TO 60 
INTm = 0 
INTtr-H) = 0 

CONTINLE - 

IPT1=IPT«1 
DO 70 K=1»IPT1 
K1 = KM 

DO 65 I =Ki , I P7 - - — 

IFC .NOT.MNrtlJ.NE.O.AND. INTU ) .LT - INT<K )-0R .INTIK 1.E3 . 0 J iGD TO 65 
DUM = INTIX3 
INTd) ■=INTIK) 

I N T ( K J =DUH — 

CONTINLE 

CONTINLE 

RETURN 

.glyjP ^ ^ 




^I'REAO 


FL-AO EFIPS LOG TAPE 


* 

«( 



READ 


LARRY HINIMAN, EARTH RESOURCES PRCGFAM OFFlCEy PhlLCO-FORD 
CALL RDLOGT {BUF ADR's RCCLNG) 


CSECT 

SAVE 

LR 

USING 

LA 

ST 

-ST 

LR 


i tl25 
2,15 
READ ,2 
3, SAVE 
3,0(13) 
13,4 (3) 
13,3 


:T,^ 


SAVE REGS 

SET BASE - 

ASM BASE 

NEW SAVE AREA ACDR 
LSA 

HSA ^ 

SAVE AREA ADCR 




L 

L 

LA 

USING 

TM 

BO 


3,0(1) 

5,4(1) 

7,TAPEDCB 
IHADCB,7 
OCBOFLGS,X* 10* 
INPUT : - ■ 


ADDF CF BUFFER 

ADDP OF WCRD-FCR-RSeClRG— LNCTH 
ADDR OF CCB 
SECCND EASE 
TEST FOR OPEN 

DCB IS OPEN - 


INPUT - 


OPEN I TAPE0C6, ,LPCCB, OUTPUT) INIT COB'S 


ft 

■#- 


'OS — ■ 
READ 


pH ■ REAC RECORDS FPCM-CCB- 

INDECB,SF,TAPEDCB,(31 ,* S* READ RECORD 


CHECK INDECB 


CHECK READ 


ajt - - — 

RTNO 


L 

LH 

SH 


8,INDECB+16 

4,DCBSLKSr 

4,14(8} 


ICB ADCR 

RECORD SIZE REAC 
LENGTH CF RECCRC READ 


ft' 

ft — 

RETURN 


CS 

ST 


OH 

4,0(51 


SET RECORD LENGTH IN BYTES 
RECCRD LENGTH TC CALLER 


DS OH 

L 13,SAVE<-4 

RETURN tl4,l2),T 


RETURN LOGIC 

OLD SAVE AREA ACDR 

RETURN TC CALLER 


END DAT A 


ERROR 


DS 

MVI 

-B • - 


OH 

0(51 ,X*FF » 
RETURN - 


END CF INPUT 
SET RECCRC LENGTH 
RETURN TC CALLER 


TO .NEGATIVE 


CS 




UNPK 

TR 

MVC 


OH 

0, FIELD 

TMP{ <3) ,FIEL0(5) 
TMPC 81, TABLE-240 
ERRHSG-5-40 (8),TMP 


REAC ERRCR OCCUPREC 

DECE AOOP 

CCNVERT TC PS EUCO-E0CD1C 
CCNVERT TO EECUC 
MCVE TC OUTPUT BUFFER 


ST 

LNPK 

TR 

MVC - 


1,FIELD 

TM.PISlsFl 2LD(51 
THP (8},TA&LE-r240 
ERRMEG-s-60(Gl VTMP 


ERRCR BITS 
CCNVERT TC 
CONVERT TO 

MCVE TC cur 


ANC CCB ACCR 
PS EUCO-EBCCIC 
EECCrC 
PUT EUFFER- 


FtT 

Si 

BR 


I Pnc B ».F B f'S L- 

i4’ 


QUT.cnT Eport^ 

cRR vfv CClci-vREn ■ 

RETURN TC SYSTEM 


I 






i. i t 


• .4 %r 


♦ 

❖ 

* CA T A ■ 

D S OF 

FIELD CS CL5 

TMP " CS CL<> 

- - TAPEDC3 DCB K.AC PH=R ,HEC FP=U , 8LKS I ZE-8800 , ECC AC=ENDCAT A, K" 

DSOP.G='PSfODNAME= FT01F001,SYNAD=EPPCF, D5VO=TA, EPOPT-ACC 

LPDCfl DCB DSUKG=PS,KACRF=Pf'»DLKSI2B= 133vLR£Cl=i335RECFM=FBK, X 

DDNAPE=LP ■ - - 

OS OF 

ERRM SG CC _ X»09*,CL132»«*READ ERRCPj R_ECCRD_ I GN£REC*^» 

cs OF ^ - 

TABLE OC C*012345678^ABCDEF» 

* • 

SAVE -DS--— nap — ^ — ^ 

if 

CCBD DSORG=PS 



Characterizations of Linear Sufficient Statistics 


by 


1 1 
B, Charles Peters, Jr, , Richard Redner,-** 

and Henry P. Decell, Jr,^ 


University of Houston. 


August, 1976 

Report #59 
NAS-9-15000 


Characterizations of Linear Sufficient Statistics 

By B. Charles Peters, Jr}, Richard Redner,! 
and Henry P, Decell, Jr, 

University of Houston 

We develop a necessary and sufficient condition that there exist 
a contlnous linear sufficient statistic T for a dominated coX- 
lectlon of totally finite measures defined on the Borel field 
generated by the open sets of a Banach space X, In particular, 
corollary necessary and sufficient conditions that there exist a 
rank k linear sufficient statistic T for any finite collection of 
probability measures having n-variate normal densitltes are given. 
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there exists a rank fe linear sufficient statistic T (as well as 
an associated statistic T itself) . 
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EffiEODUCBILrre OF ® 


1. Introduction . If W is a Banach space, Z0W will denote the^Jjiorel 
field generated by the open sets of W, The totally finite measures 
defined on ® (W) will be denoted by 9?^(W). For )i,X 0??J?(W) we will 
write u « X provided B c S(W) and X(B) = 0 implies !m(B) = 0. 

Whenever U « X, [dji/dX] will denote the equivalence clasSs of Radon- 
Mikodym derivatives of U with respect to [2] [3], If^O^ = , jpQ^will 

be called a dominated (by X ) set of measures provided there exists 
X e (W) (X not necessarily in ) such that y e implies 
y « X . We will call c0 ^ (W) equivalent to X {_c0 = X) provided 

is dominated by X and y(B) = 0 for each y e implies X(B) - 0. 


If X and Y are Banach spaces and T;X -»■ Y then, following the notation 
in [3], we write f(e)T ^(tS(Y)) provided f:X ^ R (= Reals) and f is 
(T ^0B(Y) , 28(R)) - measurable (as well as ( ^ “ measurable). 

In [3], Halmos and Savage develop an approach to sufficient statistics. 
Their results provide an alternate definition, within a very general mathema- 
tical framework, of statistical sufficiency for dominated sets of measures. 
This alternate definition is particularly suitable to the development of the 
results in this paper. We will require the statement (Theorem 1.) of the 
alternate definition in the setting of Banach spaces. 

In all that follows X and Y will be Banach spaces, T a linear 
continuous mapping of X onto Y, and (X) a dominated set of 

measures . 

Theorem 1. (Halmos-Savage [3]) A necessary and sufficient condition that 
T he a sufficient statistic for is that there exist X e?? 7(X) such 
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that = \ and e fdli/dXl such thnt gjj(e)T~^(lS(Y)) for each 
U C 

In this paper our particular concern will be that of developing 
necessary and sufficient conditions that a linear continuous mapping T 
of X onto Y be a sufficient statistic for a dominated set of measures 

In Theorem 2, we will require an additional condition on T which, to 
the best of our knowledge, is generally unavoidable . We will require 
that the kernel of T ( = ker T) be complemented , in the sense that there 
exists a closed subspace S of X such that X = ker T © S (e.g. , if 
X is a Hilbert space, take S - (ker T)'*’) . 

In Theorem 4. we will show that the condition X = ker T ©S may be 
relaxed whenever [dp/dX] contains a continuous representative. 

The results we develop are finally used to establish necessary and sufficient 
conditions that a linear statistic B:R^ ^ R^(k ^ n) be sufficient for a 
finite collection of probability measures having n-varlate normal densities. 

2. Principal Results . In all that follows we will assume that X and Y 
are Banach spaces, T:X Y Is a linear continuous mapping of X 
onto Y, and (X) is a dominated set of measures. 

Theorem 2. Let X - ker T @ S for some closed subspace of X. A 
necessary and sufficient condition that T be a sufficient statistic for 
is that there exist X e?9J>(X) such that - X and, 

ker T c {y;g^(x + y) = gj^(x), X e X} 
for each p e and some f. [dp/dX]. 
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Proof* If T is a sufficient statistic for and M e then 

there exists (Theorem 1 X and g e [d|j/dX] such that g (e)T 

Suppose y e ker T and, without loss of generality, there exists x^ e X 
such that g^C^Q + y) < SyC^g)- Choose r e R such that gy(Xg + x) < r < gyC^g^ • 
Since g ~^(-“,r) and g ”^(r,«>) are elements of "CEii (X) and gy(e)T 
it follows that there exist and E © (X) such that 

Xq + y e g ^<-“,r) » T and E gl"(r,«>) = T ^(B 2 > • Now, since T 

is linear and y £ ker T, T(Xq) e B^ n = 4>» which is absurd. 

Conversely, suppose = X, U £ and ker T c {y:g^(x + y> = SyC^c)* 

x £ x} for some £ [du/dX]. We need only show (according to Theorem 1) 

that g|j(e)T ^(fi(Y). It will only be necessary to show that for r £ R 

there exists B E t3 (Y) such that g ^(~™,r) = T ^(B ). We will show 
r ]j r 

-1 —1 

first that (-”,r) = T T(g^ (-”,r) n S) and then that 

S T(g^"^(-",r) n S) eS>(Y). 

If X £ T ^(T(g^~^(-«»,r) n S) then T(x) E T(g^ ^(-<»,r) n S) and 

“1 

hence T(x) = T(z) for some z e (-°3,r) n S. Since T Is linear 
X - z e ker T so that g^(x) = g^(x - z + z) “ ^ 

X e gy"^C-“5r). 

If X £ gy"^(-“,r) then, since X = ker T © S, x = k + s for 
k e ker T and s £ S, It follows that T(x) = T(s), s - x £ ker T, 
gy(s) = gy(s T x + x) “ gy(k) < r J s E g^"^(-‘S',r)j T(x) - T(s) E T(g^ ^(-«’,r) n S) 
and, finally, that x £ T’^CTCg^^^C-^jr) n S)) . 

We now show that T(g^~^(-«,r) n S) e © (Y) . Let TgiS ^ Y be the 
restriction of T to S and observe that Tg is a one to one continuous 
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mapping of the Banach space S onto the Banach space Y. Since T_ 

satisfies the hypothesis of the open mapping theorem T„ is a 

5 

homeomorphism of S onto Y. Since such mappings take elements of xB (S) 
into elements of ZS (Y) and is measurable, g^ n S e |S (X) n S =£ (S) 

It follows that T(g^“^(-~,r) n S) - Tg(g|^“^(-“,r) n S) e 8 (Y) and the 

proof of the theorem is complete. 


Theorem 3. Let X, X(B) = X(B - y) for each y e ker T and 

B e ^(X) such that X(B) = 0, X(C) > 0 for each non-empty open subset C of X 
and let [dy/dX] contain a continuous representative element f^ for each 
y 

A necessary and sufficient condition tk-± T be a sufficient statistic 
for ^ is that 


ker T C {y ; f^(y + x) = f^(x), x £ X} 


Proof; In order to see that the condition is sufficient we need only show 

■1 

(according to Theorem 1.) that f^(e)T (®(Y)), or equivalently, if r e R 
1 1 

that “ T (B^) for some e ^(Y)* In fact, since T is an 

open mapping and is continuous, T(f e S (Y) » We take 

H T(f and conclude the argument by showing that 

- T We clearly need only establish that 

T~^T(f ■’^(-«>,r)) c f r). If x £ T“^T(f ~^(-®,r)) then T(x) = T(z) 

H M I-* 

for some z e f^ —(-“,r). Since x - z £ ker T it follows that 
f^(x) = ^ ^ hence that x e f^ ^(-«>,r). 
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In order to prove the necessity of the condition, recall the proof of 
the necessity of the condition in Theorem 2, and observe that the hypothesis 
X - ker T©S for some closed subspace S of X was not essential* We 
may conclude that if y e there exists e [dy/dX] such that 
ker T ^ {y ^ g (y -h x) - g (x) , x c X} and f “ except on a set 

y y y 

B e (X) such that X(B) =0. 

Fix y e ker T. Since {x : f^(y + x) ?^gjj|(y + x)} = B- y and 
X(B “ y) = X(B) = 0, we may conclude that £ (x) = except on 

C = B U (B - y) and X(C) = 0. Moreover, since the tsapping x ^ y + x 
is a homeomorphlsm of X onto X and is continuous, C is an open 

subset of X. According to the hypothesis, X(C) = 0 and C open imply 
C is empty so that f (y + x) = each x £ X. 

3. Mormal Families . In what follows we will assume that {Pi}“~Q 

is a family of m probability measures defined on ?B(R^) having normal 
densities 


— n/2 —1/9 1 T — 1 

Pi(x'^ = (2 it) expl- Y (x - n^) (x - n^I; i = 0, 1, 


• , m^l * 


where and are known and is symmetric and positive definite. 

We will derive necessary and sufficient conditions that a k x n matrix B 

(k n) mapping onto (i.e., rank (B) = k) be a sufficient 

- .m-1 

statistic for {P. }. » . We first prove a Lemma. 

X x^U 

Lemma 1. If l£i£m-l and f^(x) = p^(x)/Pq(x) then 

{y : f^Cy + x) = f^(x), x c X} = ker(fi“^ - n . 
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Proof; Fix y e R^. After a little matrix algebra (which we will omit) we 
find that f^(y + x) = fj^(x) for each x e if and only if 

2x^(R^^ - S^"^)y - 2y^'(fl"^Ti^ - i2'^nQ> + - %hy = 0 

_ ^ __n _1 

for each x e R , For x = -y/2 we see that y (J2. n. ~ ’Ini ~ ^ 

X 1 u u 

—1 *^1 X 

that y e Mq ~ ^0 ^0^ * addition, it follows that 

2x^(J2^^ - ^Q^)y + ~ ^o^iy ~ ^ and, writing x = (2 - y)/2, that 

z^iSl^ - ~ 0 for each 2 e X. This clearly implies (£2^^ ~ ^0^^^ ~ ® 

so that y e ker(J2~^ - • The remaining containment follows easily. 

Theorem 4. A necessary and sufficient condition that a k x n rank k 
matrix B be a sufficient statistic for Ts that 

ID— X 1 —1 { 

ker B c [kerC^T - £2^ ) n {i2j, Hq} 1 . 

Proof: Since the preliminary conditions of Theorem 3,are clearly s itisfied 

for X = Pq, lemma 1. Insures the necessity and sufficiency of the condition. 

Theorem 5. A necessary and sufficient condition that a k x n rank k matrix B 

be a sufficient statistic for {P. ^ is that, for j = 1, . . . , m - 1, 

i i=0 

T T “1 T T ~1 

(a) £2^B (B£2jB ) = J2 qB (B£2qB ) 

(b) nj - £2jB'^(BS23'^) ^Brij = tIq - f2QB^(B£2gB'^)“^BnQ 

(c) £2^ - £2^B^(B£2^B^)~^B£2^ ^0 ” CB£2 qB'^) ^B£2g . 
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Proof: Let (x|y) = x y and (x|y)^ - x y 1 = 0, 1, ... ,m - 1. 

For S c S'*" and S"*"^ will denote, respectively, the orthogonal 

complements of S relative to the inner products ( *f* ) and ( - j* . 

*1 

If A is an n ^ n matrix A will denote the adjoint of A relative 

n *i 

to the inner product ( •(• on R . If A is a k x n matrix A will 

denote the adjoint of A relative to the inner products ( *|* )^ on r” 

k *-s T 

and ( *(• ) on R , It follows that B ^ . 

If B is a sufficient statistic for {P . }*** ^ then, according to 

1 i=0 

Theorem 3., ker B c ker(fi.^ - ; j = 1* ••• ,m - 1 and hence 

(ker B) ^ = (ker B) ® . Since this implies range (B = range (B we have 

that B °(BB BB j - B ^ and hence that S23 '^(BSJ^b'^) = KqB^CBQqB^) 

which is (a) . 

Now let Q = JZ^B^CBAgB*^) ^B and observe that == Q = for 

-1 —1 

j = 1, ,m - 1. It follows that ker Q = ker B c ker(Q^ - J2 q ) and that 

Q(QT^ - ° and hence that Q(J2. - Q.) = S2. - which, 

10 jO lOlO 

recalling the definition of Q, is equivalent to (c). 

Since ker(fiT^ - 12“^) n (S2T^n - <= (q. - and 

j 0 jnOO jO 

Tlj ^ 11 q e (ker B)"*"^ — range (B J) - range (Q), it follows that 

Q(t1j - Hq) “ ~ ^0 i^ocalllng the definlton of Q» is equivalent 

to (b) . 

Since all of the preceeding arguments are reversible, (a), (b) and (c) 
imply B is a sufficient statistic for ♦ completing the proof of 

the theorem. 

In the next theorem we will use the fact that there exists a non 
singular matrix M such that = I and hence that the affine transform- 
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atlon X ->Mx - Hq provides a ohanpe of variables that allows (without loss 
of generality or the ability to recover the sufficient statistic relative to 
the original variables) one to assume that Hq = and = I. 

Theorem 6. If = 0 and = I then a necessary and sufficient condition 

that a k J< n rank k matrix B be sufficient for (P.)? ^ is that there 

1 i=0 

exist a rank k orthogonal projection Q such that, for i=l, ... ,m-l, 

(I - Q)t\ln2l - I (^^2 " IVi - 13 = 2 

where Z is the n x (n + 1) (m - 1) zero matrix. 

Proof: If B is a sufficient statistic for ^^^[^£=0 * assume without 

T 

loss of generality that BB = I since B is a sufficient statistic for 

if and only if KB is a sufficient statistic for each nonsingular 
k X k matrix K, One may indeed choose K such that KBb'^K^ * (KB) «= I 

For t = 1, ... ,m - 1 Theorem 5. implies that 

T T ~1 T T —I T 

fi.B (Bfi.B ) = I B (BIB ) = B 

1 1 

so that 

m — 1 -1 rj, m ™ —I m 

(Bfi.B ) = BS^. B and fl.B (Bfl.B ) B = B B . 

X X XX 

T 

Right mtiltiplication of the latter equation by fl^B B will establish that 

T T T 

JJ^B B = B B 

from whence it follows, using symmetry, that 
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Since *= 0 and = I, Theorem 5- further implies 

= © 

and 

a. - = I - b'^b 

X 1 

T T H" 

Since BB ^ I* it follows that B = B (where (•) denotes the 

T + 

generalized inverse of (•)) and hence that Q = B B = B B is the 

T 

orthogonal projection on the range of B [5]. Clearly Q has rank k and 

we conclude that 

(I - q)r\^ =0 

and 

(I - Q)(«^ ~ I) = 2 

and the condition follows. Conversely, if the conditon holds let B be any 

k n rank k matrix such that range (B ) = range (Q) . Clearly B B = Q, 

BB*** = I and B**" = Using the symmetry of I - Q and - I we conclude 

that 

i 1 ■ ■ . . 

and hence that 

Q = = fi3‘’‘BB^CBE^^B'‘^)”^B 

In addition, 

m rn —1 m 

J2^B ) = B 

The obvious substitution for Q guarantees the satisfaction of the 
conditions of Theorem 5. 
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Definition 1. We will say tlmt a rank k orthogonal projection Q 
generates a sufficient statistic for provided Q satisfies 

the condition in Theorem 6. 

Corollary 1. if M= [Diinnt ... !h . I - II ... Ifi , - I] then 

12 ra-1 1 m-1 

a) Q = generates a sufficient statistic for 

and 

b) k *“ rank (Mil ) = tr (MM ) is the smallest integer for which 

there exists a rank k orthogonal projection generating a 

sufficient statistic for {P.}? „ . 

1 x=0 

Proof: Let k be the smallest integer for which there exists a rank k 

orthogonal projection P generating a sufficient statistic for • 

According to the definition of M, (I - P)M ~ Z so that PM = M 

and PMM"*^ = MM** . Since (I - )M = Z , MM**^ generates a sufficient 

statistic for {P.}^ . However, PMM^ = MM^ implies that range 

1 x=0 

+ • + 

(MM ) c range (P) so that the minimality of k and the fact that MM is 

-J- 

an orthogonal projection imply that range (MM ) = range (P) and hence that 
MM*** = p. 

in> !L 

Corollary 2. If B is a sufficient statistic for {P^}^_q . then 

i “ 0, 1 m-1 . 

Proof: The conclusion is an immediate consequence of line 6 in the proof 

of Theorem 6. 
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4. Concluding Remarks. Theorems 4 and 5, although not so stated, are 
valid for arbitrary families of n-variate normal probability measures. 
Corollary 1* formally gives the construction for a sufficient statistic 
for finite families of n~variate normal probability measures solely in 
terms of the known parameters that detenuine the densities. In fact, if 
k»rank (M) (“rank MM ) then any rank k matrix B for which range (B) “range (M) 
is a sufficient statistic for the family. Moreover, in teinns of the 
dimension of the range of a sufficient statistic, k=rank M is the smallest 
integer for which there exists a sufficient statistic. 

Several open questions concerning the “appropriate” definition of a 
“almost” sufficient statistic using the characterizations given In 
Theorems 4. and 5. will be the subject of a later paper. In this connection 
the results of Le Cam []43 , although the approach is different, should be of 
significant value^ 

5. Acknowledgement . The authors would like to express there sincere 
appreciation to Professor H. Elton Lacey for his comments. 
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Estitnatlng Mixture Proportions 

James Sparra 

1. Summary. A stochastic approximation algorithm for estimating the proportions 
in a mixture of normal densities is presented. The algorithm is shot<m to con- 
verge to the true proportions in the case of a mixcure of two normal densities. 

m 

2. Introduction . Let A = {a e > 0 and = l}. For each i, 

i = l,...,m, let be an element of R^ and 2^ be a positive definite 

real S 3 rmmettic n x n matrix. Let X be a random variable with values in R^ 
and With density function. 

p(a,x) = a^p^(x), for X & 

where a € A and 

P^Cx) = (2n)~“^^|2^|"^^^exp{- |(x-y^)^2^'^(x-y^)y 
for each i = 

We assume that a is not known but that and are known for 

i = An algorithm for estimating a will be presented in part 3 of 

this paper and in part A the algorithm will be shown to converge to a in mean 
square and ^<rith probability 1 in the case where m = 2. 
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3. The Algorithm . Let be a sequence of observations on X. Let 


n-M 


a e A* For n ^ O define a by 


n+l ,n -n 




where 


m 




and ^ sequence of positive numbers such that 


CO ■ . CO . 2 

c, = “'■ and , E c, < “ . 


k=o k 


k=o "ic 


We note that each iterate is in A and that, since X is a random variable, 
each iterate may itself be considered a random variable. 


4. Convergence of the Algorithm. 


Theorem; If a £ then the algorithm described in part 3 converges to a: 


in mean square and v;ith probability 1. 


Proof ; We refer the reader to the algorithm described in [l,pp. 332-333] and 
to the proof of convergence given in [l,pp. 350-352]. The applicability of the 


theorem given there is clear if we let f(ct) = E(Z^), for each a £ A, where 


“iCPi » X) 


(Z ) . = a, - 
oi i i P„ o X 


a 
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In order to show convergence we must show that conditions (Al)-(A3) in 
[l,pp. 332—333] are satisfied. First wa note that 


where 


f(a) = - a^g^(a^), 




PlCx) 

(x) + (l-a^)p 2 (x) 






Cx) -t- a^p^Cx) 


Further, we note that 


g, (0£i ) f Cx) [p. (x) - p„(x)]^ 

2^ = ^ ' P^(x)dx > 0 

J fa, P. (x) + Cl-a, )p„ Cx)T^ “ 


Jttj^p^Cx) + Cl-otj^Op^Cx)]' 


d g^Ca^) r P2Cx) [p^(x) - p^Cx)F 
7 [ Cl-“2) P^ + “ 2^2 ^ 


X— *pA(x)dx > 0. 

3 a 


h 


Now, ~ 1 and = 1 * since has positive second derivative 

we have that < 1 if E and 8j^Co(j^) 1 if ^ (Oja^^). 

Similarly, 82 ^^2^ “ ^ ~ ^ ® 2 ^^ 2 ^ < 1 if £ (S^,!) 

and §2C^2^ ^ ^ ^2 ^ 

We now show that (A 1 )~(A 3 ) are satisfied: Let cx e A. Then 

(Al) f(a) =0 iff 81(^3) = 1 = S 2 (‘^ 2 ^ iff a = a. 

(A 2 ) (C£-a)’^f(a) = (a 2 "« 2^^‘^2 " « 2 ® 2 ^“ 2 ^^* 

If > “ctj^ then 83^(013^) < 1 and > 0 . Then also 

0£2 < % 82(0(2) >1 and (0t2-a2g2Cct2)) < 0 - Thus, if 

otf > then (a-a)^f(a) > 0 . Similarly, if < “3^ then 
' /V. T 

(a-ct) f(ot) > 0 . Thus, A 2 is satisfied in any closed, convex 


subset of A. 


(A3) 


‘“1-2 


r 2 

/ Vi 

J Pa 


(x) 


- p,A(x)dx + 


f(W!i 

«n 


(x) 2 

) 


pg(x)dx) 


Now, we note that each term in the ith summand, i = 1,2, is 
.less than 1 so that there is an h > 0 such that E(| jz^| p) < h 
for all a e A and A3 is satisfied. 
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Introduction. Recent statistical work in feature selection for the multivariate 
normal pattern recognition problem has concentrated on linearly transforming 
pattern classes so that the transformed pattern classes are equivalently distin- 
guishable. Since, in. general, this is not possible, techniques have been 
developed to preserve the distinction of the transformed pattern classes using 
various measures of distinction. These measures of pattern class distinction 
are most often treated as eigenvalue problems [5], [6], [7], [91, 

113], [14], [15]). In this paper we consider a particular measure of pattern 
class distinction called the average interclass divergence, or more simply, 
divergence, ([11, [2] , [4] , 16] y [7] , [8], [9] , [10], [11]), where divergence 
vn'll be the pairwise average of the expected interclass divergence derived from 
Hajek's two-class divergence as defined, for. example, in [9J. 


This work was supported in part by NASA under Contract JSC-NAS-T5000 


2 


It has been shown in I4l that them always exirts a k x n real matrix 
8 such that the transformation determined by B maximizes divergence in 
k-dimensional space, and, in fact, that B can be written in the form 
(I^[Z)U, v/here U is an orthogonal n x n matrix. We will investigate the 
role of the eigenvalues of U in such problems, and give an example demon- 
strating that the divergence measure of pattern class distinction does not 
depend on these eigenvalues (Theorem 7). 

Our example is derived from the family of examples constructed in [3]. 

This special class of examples permits analytical calculation of divergence, 
a task ordinarily eschewed as unrealistic, and yields a precise expression 
for divergence. The reader is cautioned, however, not to confuse the numerical 
simplicity of this example with impracticality, since, mathematically, the 
failure of the eigenvalues of U to affect divergence in the restricted case 
erases any hope that they might be meaningful in an arbitrary case, hov/ever 
applied. 

1. Special divergence formulas . Let , . . . ,51^^ and p-j , . . . be the 
covariance matrices and means for m classes, where for each i = l,...,m, 
is an n x n positive definite matrix and p^. is a column n vector. 

■Let ■■ ' 

Then, assuming equal a priori probabilities, the average interclass divergence 
for these m classes is given by 
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D = tr(J^ S.) - ‘a in(m - l)n ; (1) 

while, if B is a k n matrix, the B-average interclass divergence is 

Dg = tr( (Bfi. B^)) - m{tn - l)k (2) 

v^h^re tr represents the trace function. 

Moreover, as observed in [31, if 

= {B c BB*'* ^ and {b’’'b)S 1. = fl.(B’’^B), i = , 

where is the k x k identity matrix and is the set of all k x n 

real matrices, then, for any B » (2) may be rewritten as 

m 1 T 

Dg = % tr(B(.|^ n:' S^.)b‘) - % m(m - l)k (3) 

For the remainder of the paper we assume that each is a diagonal 
matrix of the form: 1 , where x. is a positive real number, 

i ViJ 

and y. = y. for all i,j. Under these restrictions, .E, S. is a 

' J . . 


diagonal matrix of the form 


, where 


m 1 m 

X = ■— ( .2- ^ and p = m(m - 1). It follows from {!) that the 

l-t X. J-l j 

average interel ass divergence for the m classes is given by 


D = ^(x - p) 


As observed in the introduction, in seeking to maximize the B-average 
interclass divergence Dg, it suffices to consider those k x n matrices of 


n n orthogonal matrix. 


the form (I|^lZ)U , where U is an n x n orthogonal matrix. In the sequel 

when considering Dg, we shall always assume that Bis of this form. For 

any such k x n matrix it is obvious that BB^ = , and hence B e 

T T 

if and only if (B B)S2^. = S2^(B B) for i = l,..,,m. Wo will derive necessary 
and sufficient conditions in order that B c (Theorem 2), but first we 
calculate Dg in the case that formula (3) is valid. Recall that all means 
are hereafter considered equal and all covariance matrices diagonal of the 
form stated above. 

Theor em 1 . Let B = (I.!Z)U , wiiore U = (u..) is an n x n orthogonal 

K 1 J 

matrix, and suppose Dg is given as in (3) above. Then 

(5) 


Proof ; Since tr(XY) = tr(VX) whenever both products are defined, v/e have 

T m 1 

in this case Dg = tr(B SlT S/)) - pk . If U is written in 


block form, U 


where A is k x k , then 


(e f) • 

= Alfe|Z)"(I,|Z)U = . Sirce 


J 

i \-v 


M 

= p1 I 


where M is the k x k matrix 


n-ki 




then B^B(.Z, S.) = p • ( • Therefore, tr{B^B(.Sr, S.)) 

1 “ I 1 1 . \ I ft ij I I I I I I 


^CAM C’Cj 
I 2 vx 


k 2 


n 


k 2 

... V 


pmm + tr(cTd)) = p{{.|, + t.|, + ,4+1 'jSV'-'jq» ' 

(jly U?^)x + piql^ Ujq)). Since U is orthogonal , ^2 " 
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j=l ^ " j=l ’ so that Dg 


k o 

's((j§^ Uj^)x + p(k 


i, Uj?,)) - hpk 


“ji'* ^ “ *j5i “ji*® 


Our next result gives necessary and sufficient conditions in order that 
B = (I|^jZ)U £j^ . While the proof is rather tedious, these conditions are 
particularly easy to apply and hence useful in seeking examples. 


Theorem 


_2. Let B = (I.|Z)U , v/here U = (u.-^) is an n ^ n orthogonal matrix 

K ’ 1 J 


If, for each i = l,...,m, | ^ j 


then: 


n-1 


(1) if x^. = 1 'for all i j then B c^ ; 

(2) if x^. f 1 for at least one i, then B e (p if and only if 
jil ** ■> or jli = 0. 


Proof : If x^ = 1 , then and (B^B}S2. = for any k x n 

matrix B. Thus, if x. - 1 for all i, then B e ^ for any k x n matrix 

of the form (I. lZ)U. We suppose that x. Y 1 for at least one i. As in the 

/A c\ 

proof of Theorem 1, we decompose U into the block form i ^ p * » so 

, / a^a aV 

that B B “ V c^A C^C ) ’ where A is again k x k. For a fixed i such 


that X. f write In block form 


G. 

1 


, , , Where G. is the 

^n-k/ 

a^ag. a^c' 


I' X k matrix 


k-1 


_ / G,A‘A G-A'C 




C^A C'^’C / 


Then (B^B)o. = ( t ^ t ) » while 

^ Vc'ag. c'c/ 


/ Thus, commutes with if and only if 
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(1) A^AG^. ** and (2) C^AG. = C^A . We write A^A and C^A in block 

form: A^A = j , C^A = (^R S 1 ’ *■ ^ ^ ^ ^ * 

T T T I ^^i \ 

Since A A is symmetric, N = . Therefore, A AG. = y , 

^ \m'x- W J 

, Thus A^AG. = G|A^A if and only if M = x-M 


and G^.A A = 


X^L x^.M 


and similarly, C^AG^ = C^A if and only if Px^. = P and Rx. = R. Since 


I'P 


and Rx. = R. 
k V 

Si nee 

jSl ^'k+l^^jll 


k : 1 

, it 

j=l ^*ftl j 





follows that Mx^ = M, Px^ = P, and Rx. = R if and only if 
k k 

x,.{,.2r u.,u. = .2, u.,u. for q = 2,...,n. Thus, since x . 1 , we have 

1 J-1 Ji jq J=1 Jl 3q 1 

T T k 

that (B B)S2. = J1.(B B) if and only if .Si u.,u. = 0 for q = 2,...,n. 

11 J I J • J q 

Since the above argument is valid for any 0^ for which x^ 1 , and since 

B^B commutes with for any i for which x^. = 1, it follows that 

B e if and only if .Z, u.,u. =0 for q = 2,...,n. We next show that 

^ J"'V jq 

k k 2 k 2 

- 0 for q = 2,...,n if and only if jS-j u^^ = 1 or Uj.| = 0. 

Since U is orthogonal, ji, ^ jiktl “jl“jq ' “ /<"• 

n o k 9 n 2 k 9 

q = 2,,..,n, while 1 = u^^ , Thus, if = 1 

then u., - 0 for j = k t l,...,n,, and i, = J, u^.,Uj^ = 0 for 

k 2 

q = 2,,..,h. If .T u., - 0 , then u., = 0 for j = l,...,k 


obviously 


.J., U .,U . = 

J=1 Jl jq 


0 for q = 2, . . . ,n. 
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Conversely, suppose that -X, ~ 0 for q = If 

k »J ' J * J 'I 

9 ^ ^ 

= . . . = = 0, then = 0 and the proof fs complete. Otherwise, 

let be the first non-zero element in the first column of U, V'/here 

r i k. Then 0 = . so that 

“rq " 5^:7 'jitl “ol''jq> q '>.••• •"• Thus, if . . . . .u,., = 0, 

2^2 

then = 0 for q = 2 n and it follows that 1 = u“, = .Ss . 

rq ■ r I j= I j I 

Suppose u^.| f 0 where r < w £ k . Since nl2 %q% " ^ 


Substituting for u^^, q > 2, We have 


^Vl^ifl ^ q=2 \q^u_T j=r+l '^il'^jq^ /^rl%l ^ ^u..i ^ j-r+V^jl^q-2 %q^jq^ *" ^ 


ji jq 


jl'q=2 “wg'-jq^ 


Since U is orthogonal, then for i W w, ■ Z., u. .„u. = -u. ,u., and for 

q=2 wq jq Wl Jl 

j “ "■ qi2 Vjq ' q% ' ’ ■ "wl • j-il "jfqfe %q“jq> “ 

k 2 

%l^j=f+l ^ %1 ’ substituting in (6), we have 

%l(Uri + Multiplying by Uj„-|, we have 

"wl‘"ri j=tl “jl - " ° “wl‘jt “jl - “wlf 0 - It 

k 2 •< 2 

follows that 1 = .Z ' Um = .Z, u^, . 

■ Jf ■ J=1 ■ Jl 


We note that, if there exists at least one n. which is not the identity 

fcJ. 

. ... . J • ■•: .. 

matrix i then the proof of Theorem 2 shov/s that B B commutes with al l 

T ;■■ ■ 

n|'s if and only if B B comnutes with SI.. Moreover, in this case, the 
elements of ^ are precisely those B - (I|^(Z)U for which the first column of 


i 


reproducibilot of the 
ORT fiSsTAL PAGE IS gO^ 


U is of the form 


or / 0 


^kl 

0 j 


Hence, by Theorem 1, if B e ^ then = 0 or Dg = 0 . (Note that 
if = Ip fo»' all 1* then 0 = 0 . ) 

We close this section with a definition* If V denotes the set of all 

*7 k 9 

n >f n orthogonal matrices, let ^ = {U = (u.^) e V : “ 1 o'” Oi- 

Thus, if there exists T^ , then B = {Ij,|Z)U e ip if and only if 


2- Eigenvalues of U . Let U = be an n x n orthogonal matrix. 

As is v/ell known, [12] , the eigenvalues of U lie on the unit 
circle in the complex plane and non-real eigenvalues occur in conjugate 

pairs. Thus, if U has a real eigenvalue Xv then x = j^l , and, if 

U“ a + bi, b 7 ^ 0 is an eigenvalue of U, then ^ is also an eigen- 
value of U . Clearly, det U = +1 .Moreover, if 1 has multiplicity p as 

an eigenvalue of U, -1 multiplicity m, and {a. + b.i,a. - b.i}S , (b. 5 ^ 0) 

J J J 0 0”^^ J 

are the remaining eigenvalues of U, then U is siniiTar to a block diagonal 
orthogonal matrix PUP"^ of the form: 


^.liJPROD'tJCIBiLI'TY OF THE 
ORIGII SIAL gAOE li ' 


vihere 1 appears on the diagonal p times, -1 appears ni’ times, and each 
/a, b.\ 

Aj = ^'^1 is a 2x2 orthogonal matrix with eigenvalues a^. + b.i , 


J J' 


a. - b.i. Furthermore, the order in which the A.'s, I's, and -I's appear 

J J J 

on the diagonal can be changed to any desired order by a similarity transformation 
Thus, any two orthogonal n x n matrices with the same set of eigenvalues are 
similar. Finally, we observe that if U is a 2x2 orthogonal matrix, then 


u = 


or U = 

Ic 

2 2 

where c ' + d = 1 


1 

O 


Ud cj 



Let B = (Ij^|Z)U c . For the remainder of the paper we will be concerned 

vnth determining what role, if any, the eigenvalues of U play in determining 

Dg . If is a set of n not necessarily distinct complex numbers 

for which there exists an n x n orthogonal matrix U with eigenvalues 

, then we will say that (At,...,A„} is a {*) set . We note that 
i n J n 

if T = {A^i. , , ,A^} is a set of n not necessarily distinct complex numbers 

such that T is closed under conjugation and every element of T has modulus 1, 

then: T is a (*) set . Throughout the following, we assume that 1 s tc < n, 

where k and n are positive integers, and we assume that at least one 

covariance matrix Jl. 7 ^ I . 

1 n 

Proposition 3 . Let {Ai»...,A^} be a (*) set. Then there exists an orthogonal 


matrix U with eigenvalues A^,...,A^ such that B = (Ij^|Z)U e and Dg - D 
if and only if one of the following conditions holds: 

(i ) A- is real for some i . 


(ii) k *£ 2 and hb A^ is real . 


10 


Proof : Observe that if at least one is real, say , then by (7) 

there exists a block dia,gonal orthogonal matrix U of the form U = 

where C is an (n - 1) x (n - 1) block diagonal orthogonal matrix with 

2 2 2 

eigenvalues . Thus, if U = (u..) , then .E, u., = u,., - A, = 1, 

^ n iJ J“i J I 3 I • 

so that B = (I|^(Z)U E and Dg = D (Theorem 2). If no A^ is real, then 

n is even, and by (7) there exists a block diagonal orthogonal matrix U vnth 


A. 


eigenvalues A^,,..,A|^ such that U = / I 


where each A. is 

J ■ 




a 2x2 matrix of the form 


column of U is 


I a-|\ 


“1 

0 

\ 0 / 


®i ^‘\ 

^ , b. 5 ^ 0 . Thus, the first 

>-b. a. J J 
^3 3 / 

and hence, if k > 2, then B = {I|^|Z)U e ^ 


and Dg = D . 


Conversely, suppose that k = 1. If there exists an orthogonal matrix U 
with eigenvalues A^,...,A|^ such that B = {I|^jZ)U e , then U e^. Thus, 


if Dg - D, then U is of the form 



, where a = +1 and 


C is an (n - 1) x (n - 1) orthogonal matrix. Therefore, a is an eigenvalue 


of U and A. = a is real for some 1. 


11 



It is natural to consider the analogous condition Dg = 0. That is, 
given a (*) set does there exist an orthogonal matrix U with 

these eigenvalues such that B = {I|,iZ)U c and Dg = 0 ? The answer, as in 
the preceding case, is no in general, but it is true in some important cases. 

Proposition 4 . Let T = {X-|,. be a (*) set. If either 

(i ) 1 and -1 £ T , or; 

; (ii) i and -i e T , \ 

then there exists an orthogonaT matrix U v/ith eigenvalues {X^,...,X^} such 
that B = {Ij^|Z)U e and Dg = 0 . 

Proof . Let X-j and X^ denote the pair 1, -I or i, -i, let H be any 
(n - 2) X (n - 2) orthogonal matrix v/ith eigenvalues X2 >...»X^ , and let 

0 Z 

2 H Z , where Z denotes an (n - 2) row or column vector 

b2 Z oj 

of zeros, and if {X^, X^) = (1 > ^T}, then b^ = b^ = T , arid if 
{X-j, X^} = {i, -i} , then b^ = 1 , b2 = -1 . 

Clearly, U is an orthogonal matrix. Moreover, the eigenvalues of U 
are {Xp...,Xj^} , since det(xl^ - U) = (x^ - b|b2) det(xl^_2 - H) ^nd 
hence the roots of det(xl^ - U) = 0 are the roots of det(xr_2 ~ 

■ O 

together with the roots of x •- h^b2 = 0 , Since the roots of the former 
equation are the eigenvalues of H, its suffices to show that X-| and X2 
are the roots of x^ - b-jbg = 0. This follows immediately from the relationship 
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defined between the values of and and the choices of and b^ . 
Thus, since we assume k < n, then Theorem 2 implies that U e ^ , so 

that B = (Ij^jZ)U e , and, by Theorem 1, Dg = 0. 

Our next result shows that, if n = 3, then Proposition 4 does not 
characterize those {*} sets T for which there exists an orthogonal matrix 
U with set of eigenvalues T such that B = (I|^jZ)U e ^ and Dg = 0 . We 
will obtain a partial extension of this result to arbitrary n and we v/ill 
make strong use of the extension in our main result. Theorem 7. 

Lenma 5 . Let n = 3, k = 2, and suppose that {X-j , X^, X^} is a (*) set, 
where X^ = a + bi, X^ = a - bi. 

(1) If Xg = 1 , then there exists a 3^3 orthogonal matrix 

U with eigenvalues X^, X^, X^ such that i) e J. and Dg = 0, 

B = {Ij^lZ)U, if and only if a, the real part of X^ and X^, 

is less than or equal to zero; 

(2) if Xg = “1 , then there exists a 3x3 orthogonal matrix U 
with' eigenvalues X^ , Xg, Xg such that U e . J and Dg = 0, 

B = (Ij^lZ)U , if and only if a, the real part of X^ and Xg , 
is greater than or equal to zero. 

Proof . Observe that if U c ^ is such that Dg = 0 , where B = (I|^jZ)U, 

then by Theorems 1 and 2, U is of the form f o ^ i ’ 

\v n 0/ 

V = +1 and A is a 2x2 orthogonal matrix. Moreover, if U has eigenvalues 


RBPRODUCIBH.nY OP ^Hg 

original pag e is poor 


u 


^ 2 ’ ^3 ’ det{u) = Xy\ 2^3 • ^3 ^ det(U) = 1, 

and if = -1, then det{U) = -1 . We consider the case = 1 , the 
case X^ = -1 being similar. 

If V = 1, then A is of the form . Then det{xl^ - U) = 

+ dx^ - dx - 1 , so that the eigenvalues of U are 1, - (1+d) t i J 3-2d-d^ . 

2 

Thus, there exists U with eigenvalues X^ , X 2 , 1 if and only if there exists 

a real number d, |d| < 1 , such that 


a 


- -(litO 
2 ■’ 


b - 


/ 3-2d-d^ 
‘2 


( 8 ) 


Since |d| <1 , then i 0 , and thus, if U exists, then a £ 0. 

Conversely, if a £ 0, then d = -(l+2a) satisfies both equations in (8) 
and Id! < 1 . If v = -1 , then A = , and the eigenvalues of 

An argument similar to the preceding one 


u are 1, 


shows that there exists U with eigenvalues X^ , X^, 1 if and only if a < 0. 


Corollary 6 . Let n and k be positive integers, 1 < k < n, and suppose 
that T = {X^,...,X^} is a (*) set. 

(1) If 1 e T and if there exists a + bi e T, with a < 0, then there 
exists an n x n orthogonal matrix U with eigenvalues T such 
that U e and Dg = 0, where B = (Ij^jZ)!). 

(2) If -1 c T and if there exists a + bi c T, with a > 0, then 
there exists an n x n orthogonal matrix U with eigenvalues T 
such that U c ^ and Hg = 0, where B - (I|^|Z)U . 


REPRODUCIBILITY OP THE 
ORIGINAL PAGE IS POOR 
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Proof . By Lemma 5 and its proof, if a i-: 0, then A = 



where d = -(1 + 2a), is an orthogonal matrix with eigenvalues 1, a + bi. 

Thus, if U is the n x n block diagonal matrix ^ j » ivhere H 

is an (n - 3) X (n - 3) orthogonal matrix with eigenvalues T\{1, a Lbi} , 
then U is an orthogonal matrix with eigenvalues the elements of T. Therefore, 
if U is the n x n matrix obtained from U by interchanging the third and 
n — rows and columns of U , then U is orthogonal, and, since U is similar 
to U , the eigenvalues of U arc also the elements of T. Finally, since 


the first column of U is 


0 ) 


0 




we have U c 


i. 


and, by Theorems 1 


and 2, Og = 0 , where B = (I|^|Z)U and k <-■ n . The proof of (2) is 
similar. 


We make a few additional observations before stating our main result. 

Let U be an n x n orthogonal matrix with eigenvalues X-j , {a^ + bji }”_2 » 

where b. may be zero. Since tr{U) is the sum of the eigenvalues of U, 

0 

it follows that if X-j = 1 and > 0 J " 2,...,n , then 

, , n 

tr(U) = 1 + .^2 a. > +1 , while if X-. = -1 and a. < 0 for j = 2,...,n 

then tr(U) -• -1 + ■ < -1 . Also, if A is orthogonal and det(A) = -1, 

then -1 is an eigenvalue of A. This follows immediately from the fact that 
det(A) is the product of the eigenvalues of A , repeated to their respective 
multiplicities. Finally, if A is orthogonal, n x n, and n is even, then 
det(A) = -1 implies that both -1 and 1 are eigenvalues of A. 


Theorem 7 . Let n and k be positive integers, 1 :i k < n, let U be 


an n X n orthogonal matrix, and let B = (I|^|Z)U be such that Dg = D. 
L 1 Z \ 

J U and if B = (I(^lZ)U , then B = B , so 


If U = 


that Dg = Dg = D. Either U or U is similar to an n x n orthogonal 
matrix e J! such that Dg^ = 0, where 6^ = (Ij^!Z)U^. 


Proof . Note that the matrix U differs from U only in that the last row of 

U is the negative of the last row of U , Clearly, since k < n, we have 

B = B. 

Now suppose that n is even. If det(U) = -1, then 1 and -1 are 
eigenvalues of U and thus, by Proposition 4, there exists an orthogonal 
matrix similar to U such that B-j = (Ij^|Z]U^ z ^ and Dg^ = 0 . If 

det(U) = 1, then det(U) = -1, and the above argument applied to U yields 

the same conclusion. 

Suppose that n is odd. Then U must have at least one real eigenvalue, 
X . If X = 1 and if U has another eigenvalue a + bi , a < 0, then the 
conclusion follows from (1) of Corollary 6. Similarly, if X = -1 and if U 
has another eigenvalue a + bi , a > 0 , then the conclusion follows from (2) 

of Corollary 6. Suppose now that X = 1 is an eigenvalue of U and that 

a > 0 for all other eigenvalues a <■ bi of U. Then det(U) = 1 and 
tr(U) > 1. Since det(iJ) = -1, it fellows that -1 is an eigenvalue of U, 

and, since tr(U) can differ from tr(U) by at most 2, we have that 

tr(U) > -1 . Thus, U must have an eigenvalue of the form c + di , where 
c > 0, and hence, by (2) of Corollary 6, there exists an orthogonal matrix 
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, similar to U , such that = (Ij^|Z)Ui e ^ and Dg = 0 . The 
case in v.nich X = -1 is an eigenvalue of U and that a < 0 for all other 
eigenvalues a + bi of U is handled in a similar manner, and we omit the 
proof. 


3. Conclusion . This paper provides an example to show that, even under 

extremely strong conditions, the eigenvalues of U do not affect the value 

of divergence in the space of re<|uced dimension, 

k 
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A Review of the LEG Performance Evaluation of UHMLE 


In March 1976, Lockheed was directed to submit a plan [1] for 
comparative evaluation of several candidate signature extensions algorithms. 

The results of that test [2], car- ied out by LEG in April, were the basis 
for selection of two algorithms [3], OSCAR and ATCOR, for test and imple- 
mentation in a sub-operational system by IBM. Four simulated (SIM) data sets 
and seven consecutive day (CD) data sets were used. In the following sections, 
two points will be addressed for each data set. 1) Analysis and evaluation 
of the UHMLE test. 2) Recommendations on changes in the UHMLE algorithm 
motivated by the test. The criterion for evaluation of each algorithm will be 
overall classification accuracy (Tables 8 and 9 of [2] are attached for 
convenience) . 

I . Simulated Data Test . 

In previous tests carried out by the University of Houston consistent!'' 
good results were observed using essentially the same data set. The poor 
performance of UHMLE on SIMl and the marginal performance on 5IM4 seems 
to contradict our previous experience. The following observation on the LEG 
test may explain this discrepency. 

In SIMl the iteration sequence seemed to converge before the signatures 
had moved into the unlabeled data region. A second run which first estimated 
an initial translation X + B and then applied the general UHMLE algorithm 
was successful. Even though translation was included in our operational 
algorithm delivered to JSC, the second run was not reported in the final LEC 
analysis. 










Pass 

Local 

Accuracy 

1st LEC 
UHMLE TEST 

2nd LEC UHMLE TEST 
w/translation option 

SIMl 

93.5 

-21.7 

-2.5 

SIM2 

98.6 

-0.7 

no trans. 

SIM3 

97.0 

-1.0 

n N 

SIM4 

92.8 

-5.0 

II II 

Ave. 

95.5 

-7.1 

-2.3 

Std. 


9.9 

2.0 ■ 


Table 1 

Revised SIM test results. 
Overall Accuracy Difference 


The use of the translation in SIMl would dramatically change the outlook 
of UHMLE in the SIM test. 

The results do not suggest any modifications of the UHMLE algorithm 
except to re-state the need to apply the translation first. 


1 1 . Consecutive Day Test . 

General : The consecutive day (CD) data set consisted of three Kansas 
Intensive Test Sites (ITS) outlined in [1]. From these a total of seven 
pairs of consecutive day passes were selected from 1973-74 LANDSAT-1 data 
acquisitions . 
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ITS 

DATA SET 
ID 

DATE 

TRAINING/RECOGNITION 

— 

HAZE 

TRAINING 

RECOGNITION 

Finney 

F1709-8 

2/1 July 74 

5x6 



n 

F1673-2 

27/26 May 74 

II 

X 


iJ 

F1655-4 

9/B May 74 

II 



II 

F1726-7 

19/20 July 74 

1) 

X 


Saline 

S1455-4 

21/20 Oct 73 

3x3 



M 

S1725-4 

18/17 July 74 

II 



Ellis 

E1726-5 

12/11 June 74 

3x3 

1 

1 

i 

X 1 

1 


Table 2 

Consecutive Day Data Sets 


Two UHMLE tests were run on each data set. UH/ALL uses as its unlabeled 
sample the rectangular area containing the selected Test/Training fields. 
UH/FIELDS uses the test fields only as input. The following ground areas 
associated with each ITS are defined for further reference. 

AO - ITS ground truth site. (Not alligned with LANDSAT ground 
track. ) 

A1 - Smallest rectangular field containing selected training field. 
Used as input for UH/ALL. 

A2 - AO intersect A1 , used for classification area. 

A3 - Designated test fields ( = training fields within A2). Used 
for input to UH/FIELDS. 







Proportion Estimates . UHMLE automatically estimates a proportion vector 
for the unlabeled input data set. These estimates are used in two ways in 
the Signature Extention (SE) test. 

1) The UHMLE proportion estimates are used as a priori probabilities 
in the classification algorithm. Although this is not an unreasonable 
choice for the a priori probabilities, the UHMLE classification results are 
not comparable to those of the other candidate algorithms which used equally 
likely a priori probabilities. Moreover, in the UH/ALL test, the UHMLE 
proportion estimates correspond to Area Al. Area A2 was classified and only 
results from Area A3 were used for performance evaluation. In UH/FIELD5 the 
unlabeled input data set and the classification region were equivalent. 

2) In Tables 10-13 in [21, the estimated proportion of wheat for 
each algorithm is first compared to the local classification proportion 
estimate and then to the ground truth p''oporticn estimate for both the STM 

and CD data sets. In the CD test, the UH/ALL and Ull/FIELDS are classification 
proportion estimates for area A2. The iiuiximuiu-l i kel ihood estimates from UHHLE 
(UH/ALL/MLE) correspond to area Al . It is assumed here that the proportion 
estimate from local classification in Table 11 of [2] is based oti A2. Hence 
UH/ALL/MLE is not comparable to the local standard. In Table 13 [?] the 
standard is ground truth. It is not cleai whether or not the ground truth 
proportions correspond to AO or A2 . In either case all proportion 
estimates listed in that table are not comparable. 


Data Quality . This appears to be the most important factor in analyzing 
the UHMLE results. The CD data sets contained numerous data drops or 
"glitches." LEC was careful to choose training segments and fields so as 
to avoid this bad data in the computation of training statistics. However, 
several of the recognition segments used as input to UHMLE (in both UH/ALL 
and UH/FIELDS) were contaminated. This bad data effectively "captured" 
subclasses from both wheat and non-wheat categories and distorted means 
and particularly covariances in other subclasses. Only the data quality in 
Area A2 could be assessed from the available computer output. Further data 
drops, which may have been present in A1 (outside of A2), could also have an 
apparent degrading effect on UH/ALL test results. The implications and 
incidence of contaminated data is listed below in Table 3. We strongly 
recommend that this be the last time that this data set be used in any 
testing procedure. 


Data Set 

UH/FIELDS 

UH/ALL 

F 1709-8 

Slight 

Slight 

F 1673-2 

Bad 

Bad 

F 1655-4 

Bad 

Bad 

F 1726-7 

Bad 

Bad 

S 1455-4 

Slight 

Slight 

S 1725-4 

Good 

Good 

E 1726-5 

Good 

Good 


Table 3 

Incidence of Data Drops in CD Data Sets 



Label Switching : In the UHMLE algorithm the various subclass statistics 

move in a quasi-independent manner to better "fit" the unlabeled data set. 

In this process a subclass component of the mixture model may seek out data 
in the unlabeled sample which is from a different category than the one 
assigned in the training segment. This poses no difficulty in terms of 
density estimation, however correct category labels are required for acreage 
proportion estimates. This phenomena is compounded by subclasses being 
"captured" by data drops, leaving unmodeled data free to be absorbed by an 
existing subclass. In a number of the CD tests substantially improved 
results are obtained if the label on a single subclass is reassigned. Inter- 
action of the AI or DPA {at this point, prior to aggregation of acreage 
proportion estimates at the category level) with the view of detecting obvious 
category labeling errors, should be considered. This is a key point. We are 
simply saying that, when using UHMLE (or other algorithms), the spectral class 
identity extrapolated from the training segment may not be sufficient to 
establish crop category identity without AI interaction. 


Individual CD Data Set Results. In this section each CD-data-set test is 


analyzed separately. Some revised results are reported along with supporting 
nationals. 

F 1709-8 Two classes have inflated variances due to a data drop. However, 
both UH/ALL and UH/FIELDS do better than local classification. 


F 1673-2 Very poor performance on both cases is observed. Two data 
drops have major effect on distorting variances and means on several sub- 
classes. If one subclass, which is obviously mislabeled, is switched from 
wheat to non-wheat a substantial imorovement is observed. 




LEG Test 

Revised 


Local 


UH/FIELDS UH/ALL 

UH/FIELDS 

UH/ALL 

96.1 

0.1 

-23.7 -21.3 

-3.1 

-8.6 


In Figure 2, the subclass means determined by UHMLE are plotted in the TACAP 
"brightness x green" coordinate system. Subclass W7 is clearly displaced 
from the other wheat subclasses. It is not unreasonable for mislabeling of 
this magnitude to be easily detected by an AI or DPA and corrected at the 
time of acreage estimation. 
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Figure 2. TACAP plot of class means. 
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F 1655-4 Again two data drops play a large role in distorting several 
subclass signatures in UH/ALL. One label switch again improves matters 
greatly. In UH/FIELDS the effects of 


Local 

yi 

UH/FIELDS 

UH/ALL 

Revised 

UH/FIELDS UH/ALL 

94.9 

-3.8 

-3.1 

-15.0 

not revised -3.3 


the data drops are not as apparent in the overall classification accuracy. 

F 1726-7 Data drops substantially distort four subclasses in UH/ALL and 
to a lesser extent in UH/FIELDS. Even so, results are excellent (better than 
local classification) in UH/FIELDS. UH/ALL results are poor. No clear 
label switch is apparent. 


S 1455-4 In this data set only four subclasses are modeled. Two subclasses 
are distorted by data drops, one severely in both cases. In the UH/ALL case 
the A1 area is much too large, introducing a large segment of extraneous data 
into the unlabeled sample. Further AE is not contained in A1 (see Figure 3), 


4 














{129 24) 



(71,94) 

Figure 3. 

Field Definition Errors in S 1455-4. 

The poor data quality, errors in field definitions, and small number of 
subclasses render the interpretation of this test null and void. Inclusion 
of this test in the overall UHMLE evaluations is, therefore, meaningless. 

S 1725-4 There are no data drops or anomolies in this test. 

E 1726-5 There are no data drops. A reasonable case could be 


made for a label sv'/itch, however, the explanation is not as obvious as in 
the previous data sets and it will be omitted here. This case appears to be a 
reasonable test of the algorithm. 
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Q 

e 

P Summary of CD Test . If we introduce the three label changes (easily 

detected by an AI or DPA) suggested in F 1673-2 and F 1655-4 and omit 
- the unacceptable test of S 1455-4, the performance of the algorithm is 

I distinctly different than that reported in [2]. In light of the results 

presented here, the conclusions drawn by LEG in [21 concerning the relative 
■ performance of UHMLE are, at best, questionable. The original results along 

with the aforementioned revision and omission are listed in Table 4 below. 




LEG 

Original 

Revised 


Data Set 

Local 

UH/FIELDS 

UH/ALL 

UH/FIELDS 

UH/ALL 

F 1709-8 

79.5 

2.7 

7.3 

same 

same 

F 1673-2 

96.1 

-21.3 

-23.7 

-3.1 

-8.6 

F 1655-4 

94.9 

-3.1 

-15.0 

same 

-3.3 

F 1726-7 

80.0 

0.9 

-6.8 

same 

same 

S 1455-4 

86.5 

-12.1 

-29.5 

OMIT 


S 1725-4 

85.4 

-4.3 

0.9 

same 

same 

E 1726-5 

66.2 

1.4 

-7.3 

same 

same 

Mean 


-5.1 

-10.6 

-0.92 

-2.97 

Std. Dev. 


8.7 

13.1 

2.9 

6.1 


J. 


/ 


Table 4. 

Revised UHMLE Test Results. 

Overall Classification Accuracy Differences. 
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We maintain that there is considerable evidence (provided, in part, by this 
analysis) for rejecting the original analysis and conclusions. I-f for no 
other reason, the poor data quality in five of the seven CD data sets chosen 
renders the LEC test results, as they pertain to UHlhLE, invalid. 

III. Conclusions . 

Although the LANDSAT-2 data does not contain nearly the frequency of 
data drops observed in the LANDSAT-1 data used for this test, we clearly 
must incorporate a data editing scheme into the UHMLE algorithm or assume 
that preprocessing has deleted these pixels. There has been preliminary 
testing of a thresholding scheme which appears to be an adequate method when 
used in conjunction with an initial X + B translation. 

The reassessment of labels after signature extension remains a major 
priority in the UIIMLE signature extension algorithm. This is a small task 
in terms of time compared to complete local training by the AI, and appears 
to be a necessary AI interaction function coupled with automatic processing 
of recognition segments. 


I 


/ 


SUMMARY 


Our comments on the SD test and on the CD test suggest that the 
UHMLE algorithm i n particular and mixture density estimation in general 
should still play an important role in the solution of the signature 
extension problem. In another paper f4I, the signature (e.g., Procedure 
1) extension problem, in the context of the LACIE training procedure is 
reformulated. Mixture density estimation (supervised or unsupervised) v/ill 
certainly play a role in the exaction of the Spectral Information Classes 
described in that paper. Additional work on the UHMLE algorithm, especially 
the details of incorporating it into the LACIE training procedure, we believe 
to be essential. These details are treated in the reformulation given in [4]. 
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TABLE 8.- OVERALL ACCURACY FOR SIMULATED DATA* 

[A minus sign moans the algorithm v;as less 
accurate than local classification.] 


Data 

Local 

accuracy 

Percentage difference betv;een 
local accuracy and that obtained 
with various algorithms 

R(S) 

HLEST 

Ull 

fields 

R(C) 

UT 

SIMl 

93.5 

0.0 

-3.5 

-21.7 

-2S .6 

-99.3 

SIM2 

98.6 

0.0 

0.0 

-0.7 

0.0 

-18.3 

SIM 3 

97.0 

0.1 

0.0 

-1.0 

-5.2 

-50.0 

SIM4 

92.8 

-0.1 


-5.0 

-2.9 

-8.8 

Mean 

95.5 

0.0 

i 

|H 

-7.1 

-9.4 

-44.1 

Std . dev. 

2.8 

1 

0.1 

Bl 

9.9 

13.6 

40.8 


iv 

Prepared by LEG [2]. 
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Introduction: 

The following algorithm has been suggested by Decell and 
Smiley In fll for optimal linear combinations In the feature 
selection problem. 

Let V be a continuous function from (see definition 1) 

into R-*- that Is invariant under multiplication on the left 

' by kxk invertible matrices. Then there exists %l 

J- n 

(see definition 2) such that 

</^ ( tlj.! Z] H ) = l.u.b.f ( L I, I Z] H )1 . 


Now for each positive integer i, let the element H f /-/ 


n 


be chosen such that 


V^( ■ -Hi) 

The question of whether or not the above process terminates 
at an absolute -extremum (rank k maximal statistic) appeared in 
[i]. In this paper, we show that there exists a function ^ as above 
for which the above process does not terminate at an absolute 
^ -extremum . 

Let be the matrices representing Householder trans- 

formations. Then for the matrix [l^^I Z] • • -Hp, let 0 ( jlj^l z] • • *Hp) 
be the span in of the k row vectors of that matrix. Suppose 
that v^,...,Vj^ are linearly Independent vectors in Then we show 

in this paper that there exists some integer p - mln(n,n-k) and 

Householder transformations whose matrices are H, , . , . ,H for which 

1 P 


a 


I"'.:, n 




2 


0( £lj^| zj . . *Hp) = Span£v-]^ , , . . . We also determine the minimum 

Integer p having the above property. 

/ 

Preliminaries : 

Definition 1. Let be the set of all kxn rank k matrices. 

n 

Definition 2. Let denote the set of all Householder trans- 
formations . 

Definition 3* Let>§|^ denote the collection of all vector 

subspaces of of dimension k. 

Definition Let = {x€R" t Ijxll = 2 } . 

Definition 5. Let^be a closed subset of and x 4^ C • Then 

there exists c € C such that Hx~c ll - //x-c// for any 

X A 

ccC. Let p(x;C) = //x-c^^// . 

Definition 6, Let A and B be elements of>^^ . Then there exists 

n 

an element a* G AO s” having the property that 
pCa»; ^ ^(a; EHS”) for all aGAHs^. The num- 

ber ^(a*; BHs^) will be called the distance from A to B 
and will be denoted by the symbol d(A;B). 

Proposition 1, For any elements A, B, and C in ^ ^ 

1) dCA;B) ^ 0 and d(A;B) = 0 if and only if A = B. 

ii) d(A;C) ^ d(A;B) + d(B;C). 

lii) For any ^^0 there exists a<f:v0 such that whenever 
d{A;B) ^ iT j then d(B;A) -i. ^ . 

Definition 7. For any PGj^I^ and £* 0 , let 

d(X;P)^ 

Definition 8. Let T be the topology on determined by the 

subbasis ^(P) | 0 and ^ | • 


/ 
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Definition 9. Let G be a closed subset ^^>i3n* 

Let D(P; C ) = g.l .b . d(P;C) | C G C] . 

Proposition 2, is normal. 

Proof; Let (5 and ^ be two closed disjoint subsets of^ 
Let = {pe^J^ I D(P; <T) V(P;0)j and 

t?2 = { I D(P;C?) ^ D(F;0)j , By Proposition 1, 

we can determine that are both, open and are 

disjoint. This completes the proof. 


Definition 10. For any vector w = 


'Vl 


, w 


n 


in R*^, let 


U fVl 

w = : 

n 


and w 


L _ 


Vk+1 


w 


n / 


Proposition 3 . Suppose that ^v,,...,Vj^| Is a collection of 

linearly independent vectors in Let p be the dimen- 
sion of Span . ,v|^ I and assume p-ii 0. Then there 

exists a vector x G R^ such that llx/l = 1, and if is 
the Householder transformation determined by x, then the 
dimension of Span . . . ,Hj^( V|^)^ J = p-1. 

Proof: Case i) Dimension of Span v^, . . . , v|^ j is less 

than k. We select a vector x^ in Span v^, . . . ,v^| such 
that Hx^lf ~ V , Since |[v^"-2(v^*x^)x^J = 0 for 

i=l,...,k. It follows that the dimension of 
Span ^ v^-2(v^«x^)x^, . . . ,v|^-2(v^-x^)x^] is p-1. Now by 
assumption there exists a vector x^ in such that 
llx^H and v^-x^ = 0 for 1=1,..., k. Since 

v^-2(vj^-x)x^ = v^-2(v^«x^)x^, then the dimension of 


/ 


;3 


n 



x^)x^. 


,v^-2(v 


L 

k 



Is P“l, for 


Case ii) The dimension of Span|v^, . . . ,vj^ j 

We select a vector x^ in Span ’ * * ’ ’'^kl 

Then we have that the dimension of 

Span |vJ'-2(v^.x^)x^,...,v^-2(v^.x^)Xq j Is 

assume then that x^ = A x^ for some 7\ ^ 1. 

o 

vector x^ in such that if x - |x^j 

2 

11x^11 = 1 and v^-2(v-.x)x^ = v^--2(v^.x^ 

11 1 1 o 


= k. 

with 11x^11= 

p-l . We 
We want a 
then jlx^l|^+ 

)x^ for i=l,. . . ,k. 
o 


By substituting x^ into this equation in place of x^ we 
can determine that v^.x^ = (•^^^)v^,x^ for i=l,,..,k. 

By our assumption we can find a vector x^ satisfying the 
above equations whenever a choice of A is made. We ob- 
serve that if A approaches 1, then 11x^11 must approach 
0, and llx^ll must approach so that if A approaches 
Ij then ilx^'^ii + iix^ir must approach If A approaches 

Oj then llx“ll approaches + co and lix^H approaches 0 
so iixUir + II x^ll approaches +co as A approaches 0. 

It follows from this that there exists some A for which 

2 II 2 

lIx^H + jlx^ll = 1. Thus we have the dimension of 
Span^v^-2(v^.x)x^, . . . ,v^-2(Vj^.x)x^^ is p-l which is the 
required condition. This completes the proof of proposition 

3 . 


/ 


PT^PTlODUCIBlLirY OF THE 

Definition 11. For any MC let 0(H) * Span|vj^, . . . j Vj^j 

where {v^^j . . . ,Vj^| are the row vectors of M. 0 is easily 
seen to be continuous. 

Proposition Suppose that 0 z]h^. . .Hp) = Span^v^ , . . . ,v 

for Householder transformations H, Then the 

1 ' p 

dimension of Span[v^, . . . ,v^ j cannot exceed p. 

Proof: We observe first of all that for any collection 

of vectors f Householder transformation 

Hjj determined by the vector x that 

Span^Hjj(yi) , . . . ,H^(y^)| C ‘ * >ym’^ j ** 

Now 0([lj^(z]H^...Hp) * Span^Hp.. .H^(e;^),...,Hp. . .H^(ej^)] 
where is the vector with 1 in the i^^ place and 0 
everywhere else. Thus by the above statements, 

Span^v^ , . . . , ^ Span^e^ . .,e^,x^,.. x^ ^ . 

It follows that Span^v^, . . . , v^l C Span|x^, . . . ,Xp | . 

Thus the dimension of Span^^v^, . . . ,v^^ id less than or 
equal to p. This completes the proof of Proposition 

Proposition 5- For linearly independent vectors | v^ ,Vj^| , 

if p is the dimension of Span^v^, . . . ,vj^^ and pi.0, then:: 

there exists Householder transformations H, , . . . ,H 

1 P 

such that 0 ( zj H^ . . ,Hp) “ Span^v^ , . . . ,Vj^^ and no 
fewer than p Householder transformations can have this 


property . 

Proof : This is a consequence of Propositions 3 and 4 . 
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ORIGINAL PAGE IS POOR 


Construction of the map ^ 


Definition 12. For any let P = Span^v^, . . , and 

define L(P) = the dimension of Span^v^, . . . . 

Definition 13. For O^p^n-k let^p = [ AcJk|L(A)^p] . 

Proposition 6, is closed for p=0,..,,n-k. 

Proof: This is a consequence of the fact that if 

. . . jUjjj}* is a collection of vectors in and q 

is the dimension of Span^u^ , . . , ,u^^ then there exists a 
real number such that if jjuj^-u||| for i=l,...,m, 
then the dimension of Span^u|, . . . ,u*| is greater than or 
equal to q. This completes the proof of Proposition 6. 

Now for some pe^^ there exists ^-^0 such that if Ae , then 
does not contain P. Let be the closure in^j!^ of 
. By Urysohns lemma, f 2j there exists a continuous 




function C R^ such that = 1 and (|>^(A)=0 

for any k€iQ. Let I = Span^e^, , . . . Then C 6^ 

since le Define a map 2 n"^ t- 

(j) 2 (X) = 0 if and c() 2 (X) = g-d^X;I) if xcl^^d). 

Let (|) = (|)j^ + and define ^ =<^®0. We observe that 

Si =e([[i,U]H|HeWj) . Also if 0 ( zj = I 
for some then for any 0 ( zj . 

That V has the desired properties follows from the fact that 


the function ^ has a maximum value of at I over the set dC 
but c|) has a maximum value of 1 at P over the entire space ^ 


i 
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Sufficient Statistics for Mixtures 


of Measures in a Homogeneous Family 

by 

Charles Peters 
Department of Mathematics 
University of Houston 


1, Introduction ; 

Let (X,<2.) and be raeasureable spaces and let T : X + Y be 

surjective and measureable. Let be a set of finite positive measures on 

(X, d). For each U e"??/ there corresponds a measure pT ^ on (Y,l^) defined 
for F e by 

PT“^(F) = p(t"^(F)). 

If f is a p-integrable real valued function on X, then as a consequence of 
the Radon Nikodym Theorem, there is a pT integrable function Y 

satisfying 

/ e (f)dlJT"’- - / fdM 

" I-l(F) 

for each F Clearly defined only up to sets in of pT ^ 

-1 

measure 0 and f - g a*e- (y) implies a..e. (yT ), The 

linear operator defined as above maps the space ^^(X,£Z ,y) to the space 

^^(Y,'^,yT and is called the conditional expectation operator. Its value 


2 , 


e^(f) at f C?,y) 

T. 


is called the conditional expectation of given 


The conditional probability of an event E e <X is defined as 


P„(E) - o^(Xe) 

where Xt? is the iDdicator function of E. The conditional probability 
functions satisfy 

(a) :Cb^J(Y,~l3, yT~b. 

where ^(Y, ,pT~^) Is the set of all real valued 2 ?-ineasureable functions 

on Y, with equality defined as equality a,e. (pT ^) - 

(b) For each F c (Z , 

h(E n T“^(F)) = / P (E)dpT"^ 

F ^ 

(c) 0 < P (E) < 1 for each E e and P (X) = 1. 

. OO 

(d) If is a disjoint sequence of events in QL , 

V n5l ■ Jx V"n> -e. (mT-1). 

It should be noted that satisfies property (c) even when \i is not a 

probability measure* 

The transformation T is called a sufficient statistic for if 
for each E c ^2 there is a -*measureable function P(E) on Y such that 

_i_ 

for each y 5 ^ P(E) a.e*, (pT ). The setT?? is dominated by 


a measure A ’(perhaps not in 7?/) if for each ]l zl7[ ^ |i is absolutely 


3 . 


)F THE 
POOR 


continuous with respect to X,( written y << X.)'|^is homogeneous if it is 
dominated by each of its members. A measure X is equivalent to TH if 
A dominates 'Ifl and |j(E) = 0 for each \1 C '7H implies X(E) = 0. 

The notation and terminology used in this paper are taken from (Halmos 
and Savage; 1949), as are the following three theorems. The notation 

means that there is an element of the equivalence class of 

Radon-Nikodym derivatives which is T (^) measureable. 

Theorem 1: If 7?t is dominated, then a statistic T is sufficient for if 

and only if there exists a measure X equivalent to 7^1 such that for each 

M 

Theorem 2; If is dominated, then a statistic T is sufficient for if 

and only if T is sufficient for each pair {y,v] of elements of . 

Theorem 3: If ''fYt is homogeneous, then a statistic T is sufficient for 7'^ if 
and only if for each y,v tj/l- 

Homogeneous Families : 

Henceforth, we will rissume that ’"/At Is homogeneous. Let C(7/b denote the 
cone generated by }'/l , excluding the zero measure* That is, is the set of 

all finite linear combinations, with strictly positive coefficientSjj of elements 
of '7H • Liements of C(J^) are termed mixtures of elements of • Clearly, 
C(j:i) iri also homogeneous; hence, the spaces ^ same 

for Vi £ CC7?0 and may be denoted simply by<^ , For p £ maps {L to 

^ and it is clear from the definition of a sufficient statistic that T is 
sufficient for a subset of if and only if the conditional probability 


EEPRODUCIBILrrY ' 
ORIG INAL PA GE^ 


4. 




functions P for y c ~Vl, are all equal. 

Lemma 4: If '7^ is dominated, and T is sufficient forT'/^, then 

T is sufficient for'7^ . 


Proof; Let X be that measure equivalent to whose existence is assured 
by Theorem 1. If ye then u can be written 


y = Z 6. V 

i=l ^ i 

with 3^ > 0 ^ for i = 1 k. Hence, 

Thus T is sufficient for CO?t) and hence is sufficient for"/2.. 


In order to characterize sufficient statistics for Tt.'^ it suffices, 

by Tlieorem 2, to consider a pair 

'*1 ■ ilii “i 


''j ■ jw "j 

in 7/ . where I and J are finite sets; > 0 for k e TuJ; and the 
measures {y.} „ are distinct members of /•'/, as are the measures {y.}. 

The set C( ) of all finite mixtures of elements of is said to be 
identifiable (Teicher^ 1960, 1961; Yakovritz .1969) if each element of 
can be expressed in only one way as a linear combination with positive 
coefficients of elements of'?/^, except for the order of the summands. Equivalently, 
C0)l) is identifiable if the set 7^ is linearly independent over the real numbers. 


reproducibility of the 
ORIGINAL PAGE IS POOR 


The concept of identif lability is very important in establishing the 
uniqueness and consistency of various estimators of the so called mixing 
parameters :iel} in a mixture (Yakowitz, 1969). 

Given a mixture in COVO we have for each E zQ- , Y z'Q, 

/ P (E )dn-rT~^ , „-l- 

p = Pj. (E n T (F )) 

= T. 3.P. Ce n T ^(f )) 
i£l ^ ^ 





■*‘i -1 

:r dM T , 

dp 


Let be the equivalence classes in I modulo the relation i = k 
and only if P = P ; that is, if and only if T is sufficient for the 
pair ‘ Then we have 


dp.T 


-1 


F ^i 


dpjT 


ip,-T 




I ih icl„ ^i ^ '^5^1 


A d-T -I. 


where P (e ) is the common value of the P (e ) for i e I„. Thus, 


P 






^ dpi T ^ 

p =7 & p 

'‘-1 d „ i -^ « 

I 




ill: 
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where is the mixture 

■L A 


Uj = 2 B. V. 

a ^ a 

Whenever the conditional probability function P of a mixture p, is 

p j 1 

written in this fashion with being equivalence classes modulo the 

relation we will say that is written in normal form. 

Definition 5 : The set C(y/l) is conditionally identifiable with respect to 


the statistic T if for each pair in C(7^i) ^ whenever and 

P , P are expressed in normal form 


r dy t"^ 

= .1, p 




Pi 


d\i T"^ 

I 

dy T~^ 
s J 

P = , Z, ^ P 
y, k=l — 




-1 




then r = s and for each Ji = there exists exactly one k = l,...,r 

dpjT 

such that — - — = — and P = P . The set C(7?t) is 

. , -1 yj y 

dy^ X dy r 

j 

marginally identifiable with respect to T if the set {pT ^|p is 

linearly independent over the real numbers. 

Theorem 6 : If C(ryft) is both marginally identifiable and conditionally 

identifiable with respect to a statistic T, then C(,'7f0 is identifiable. 

Proof : Suppose Pj- 2 3-P.= 2 ^.P.-P,* where the measures in each 

^ i£l ^ ^ jeJ ^ ^ 

sum are distinct members of Vi . Then, expressed in normal form, 


I 


dUr dy, T~^ 

r 1 r J 

p = r p ^ T. — ^ — 2i p =s p 

'‘l ^ dM,T-l “l, \ *■/ 


^ ■- 

and we may 

assume without loss of 

generality that 

> ; 

i : 


dyj- t“^ 
H 

dy 

t 

3 

» 



dUjI-l 

i ■■ 

and 

PUl - Pllj 

for SL = 1, . . . ,r 


-1 -1 -1 -1 
Since y^T = y,T , it follows that y T = y T . For i,k e I o , 

^i^ 4 y^T » for otherwise, since Py^ = would have y^ = Uj^, 

contradicting the assumption that {y^ : id} are distinct. Similarly, the 
y,T ^ for j e J are all distinct. Since Cil^O is marginally identifiable^ 
I and J have the same number of elements and for each 1 e there is 

Xf Xt 

a unique ,j(i) E such that (i) ^i^' ^ ~ ^j(i)*^ 

P, = P J it follows that y. = y./.s for each i E 1«. Therefore, 

"i “JCI) 1 jW t 

there Is one to one map i from I onto J such that B./.s = B. snd 

j(x) 1 

^j(i) ” each i £ 1* Hence, C(7^) is identifiable, and the proof 

is complete. 

For conditionally identifiable sets of measures, the following' theorem 
and its corollary provide some characterizations of sufficient statistics. 


Theorem 7: If is homogeneous, COfl) is conditionally identifiable 

with respect to a statistic T, and iri CChO^ then T is 

sufficient for the pair p , y if and only if there exist partitions 

L J 

1 = 1, u ... ul and J = J, u ... uJ such that for each 5. = l,...,r: 
1 r 1 r 


(a) 


d( I g n,)/ d( i; s 


ieT.e 


1 X 


jeJi "r du, 


dy, 


and 

(b) T is sufficient for the set Nj^ = : k e I^uJ^j^}. 

Proof : First suppose such partitions exisl By (b) T is sufficient for the 


set N- and hence, by lemma 4, it is sufficient for the pair {y ,y }. It 

1 

follows from (a) and Theorem 3 that T is sufficient for the pair {y ,y^}. 

L J 

Suppose that T is sufficient for the pair {y^,yj}. Then, expressed in 
normal form, ^ 

^ r djj, 

.2, -t-A- 1 Py. = r P. 


a=i dy. T 


A=1 dyjT-1 


and we may assume without loss of generality that 

„-l j,. 


dp, X 


dy, 


dPjT 


.-1 


dyjT 


^ 


= P 


for each 1. 


The condition P = P 

yi yj 


is equivalent to (b, . By Theorem 3, there exists a 

,-l 


representative f £ 


dyp ^ dy^T 

— which is T i'o) raeasureable. If g e 
dy T 


dyjT~l’ 


then g-T is T C"^) measureable and for each F e^. 


/ g*T dPj 

T"1(F) 


/ g d)i T 

F 


,-l 


y^T"^(F) 


/ f dy 
T-1<F) 
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It follows that g-T = f a.e.(y ). Thus, 


dy.T"^ dy t“^ 

* T = {g.T i g e } 


Since T is also sufficient for the pair {y ,y^ }, a similar argument 

gives 


dyr T dy„ 

il ^ - H 

dy, 


for each Si. Since 


for each il, it follows that (a) 


holds for each S. and the proof is complete. 

Corollary 8 ; If yyt is homogeneous and CCPO is conditionally identifiable 
with respect to a statistic T, then T is sufficient for a pair {y^jP^} 
in C(17D if and only if there exist subsets I^^c I and c J such that: 


(b) T is sufficient for N = ; k e I^ u J^^}, 


Proof: That T sufficient implies the existence of 1^^ and satisfying 

(a) and (b) is immediate from Theorem 7. Conversely if Ij^ and satisfy 


10 . 


0 


(a) and (b) , then T is sufficient for u , u by (b) and hence, by (a), 

-^1 


T is sufficient for 


Given a pair of mixtures M , p in C(/V(), we will call their 
. JL 

dPj- 

likelihood ratio indecomposable if cj and 


dp. 


dUi dy^ 

— imply I, = I and J, = J. It is clear from Theorem 7 that 

dPj dpj ^ 1 1 


if 0(1)1) is conditionally identifiable with respect to T and a pair of 


mixtures Pj in C(Wl) have an indecomposable likelihood ratio, then 


T is sufficient for {y^, pj} if and only if it is sufficient for 


{y^ ; k e I u J}. Also, it is not difficult to see that for each pair 


Pj, Pj in C(ltt) there exist nonempty subsets c l and c J such 
that 


dy, 




dy. 


dy. 




and Lhe likelihood ratio 




is indecomposable. If and represent 


the probability laws for two alternative hypotheseo , then there would be two 
advantages in being able to Identify subsets and satisfying these 

two criteria. Firsts the maximum likelihood decision procedure would be simplified, 
and second, the search for a statistic sufficient for deciding between the two 
hypotheses and having the property that C(7?l) is conditionally identifiable 


could be restricted to those statistics sufficient for {y. : I- u J }* 

tit JL Jw 


3. Sufficient Linear Statistics for Mixtures of Normals: 


If is a subring of the ring introduced in Section 2, then with the 






4 




.L 


11 . 


usual definition of addition and multiplication by elements of ^ fhe set 
of all functions (}) : (2 is a module over ^ . Thus, it is natural to 

cons ider T^- independence of a set ^ of such functions. To be precise, is 


/^independent if whenever ^ finite set of distinct elements of 


^ and Y , . . . ,y are elements of ^f( such that 
1 m 


Ti4^tCe) ■b...+ y<!i(E)=0 for each E ^ (I, f 
11 mm 


then Yi= ••• = Y =0, If 'k is a subring of '/ which contains all the 
1 m , ^ 


J-i 

bounded Radon-Nikodym derivatives — y> ^ ^ C(7?0 , then it is clear 
that 7\ -independence of the set {P^ : p z'TT]} implies that C(7?D is 


-1 


conditionally identifiable with respect to T. 

n 1 k 

For the remainder of this section we will assume that X is , Y is Ijv 
(k < n) and T : X Y Is linear and full rank. Ci and VJ are respectively, 
the Borel fields on fj\ ^ and also assume that each p 7^] is described 


by a normal density function with mean and covariance . That is. 


for each E £ Cl , 


P(e ) = / f, dA , 


P n 


where A^ is Lebesgue measure on 


By a suitable choice of the coordinate system, we may represent the densities 


as joint density functions f^(y,z) on ^ while representing T 


as the projection T(y,z) = y. Then the marginal densities 


(y) = / f (y,z>d! 


IR 


n-k 


.1 


are normal with means Tm and covariance matrices Tfi T (Anderson, 1958) 




12 . 


The conditional density functions 

hp(z I y) ■= 


f (y>z) 
g^(y) 


are normal as functions of 



with means 


(1) Sra + Sfl T^(TJi T^)“^(y - Tm ) 
and covariances 

(2) 

where S is the linear operator S(y,a) = z. The conditional probabilities 
P ( E ) are represented by 

Pjj( E ly) = / h^(zly)dz 
Sy(E) 

where S^CE) = {z e 1 (y,z) e e }• 

Theorem 9 : If *7^ is a family of Borel measures on [j^ ^ given by n-variate 

n Ic 

normal density functions and T : linear of rank k, then 

CC?7D i® conditionally identifiable with respect to T. 


Proof: It can readily be verified that conditional ident-? f lability of CffTt) 

is not affected by the change of variables lust described. If y and y 

dpi-j- ^ 

are in CCWt) , then the Radon-Nikodym derivative . ^ , is represented by a 

dy^T" 

function of* the form 


gj-(y) 


•^T 


B.g (y) / 

1 


E 

jcJ 


B.S., Cy); 
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i.e., a ratio of mixtures of k-variate normal density functions, which 

is continuous. Hence, by the remarks in the first paragraph of this section, 

it suffices to show that the set ^ ^ of conditional density 

functions is /^-independent, where jfZ is the subring of ^ consisting of those 

elements of ^ which have a continuous representative. To this end, let 

P ,...,P be distinct and let y , . . • ,y be continuous real valued 
Pi Pr . i r 

functions on such that for each E £ (2 , 


Yi(y)P^ ( Ely) +....+ Y^(y)P„ (e ly) = o 


r ' ]ij. 


for almost all y. In particular, choosing for E sets of the form x K, 

11“ k. 

where K is a borel set in , we have 


Y-,(y) / h (zly)dz +. . .+ Y,.(y) / b (z|y)dz = 0 
i K ^r 


K ^1 


for almost all y. For each K, / h (z|y)dz is a continuous function of 


k 


y. Hence » 


/ (Yi(y)h (ziy) +...+Y„(y)h (z|y|)dz = 0 


for each y e fj;? • It follows that 


y (y)h (z|y) +...+ y (y)h (zly) = 0 

X ^1 ^ 


k- n,““lc h 

for each y £ (f.-; , z e , Let F be the set of y ^ where two or 

more of the conditional density functions FP (zIy) are equal as functions 


of z. It is easily seen from (1) and (2) that the Lebesque measure of F is 

zero. For y F , (h (•ly),...,h ( *Iy)} is a set of distinct normal 

*^i *^r 

density functions of z. Hence, (Yakowitz and Spragins; 1968), they are 

linearly independent over the real numbers. Therefore, for y ^ F , 

Yi (y) ••• ~ y (y) = O* That is, Y, = • ■ • - Y ^0 as elements of ^ . 

L V It 

has a density function 

The following theorem is an 

Theorem 10 ; Given the assumptions of Theorem 9, the statistic T is 

sufficient for a pair y^} in C(7/p if and only if there exist partitions 

I “ 1- U...UI and J = J, such that for each S, = l,...,r, 

i r 1 r 

(a) j: (3. f,, (x)/ I 3. f„ (x) 

^ jcJji J 

n 

= r 3. f (x)/ E 3 f,, for each x £ , 

iel ^i jEJ j ’j 

and 

(b) ,T is sufficient for the family {f^^ : k E normal 

Ic 

density functions* 

There is set of purely algebraic conditions which are equivalent to (b) ; 


Thus, COft) is conditionally identifiable* 

If is in C(7)0 i then Uj 


f = 2 S. f 
iel ^ 


which is a mixture of normal density functions* 
immediate consequence of Theorems 7 and 9* 
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namely, that the expressions 


(T T^)“^T 

>^k \ \ he 


m - (T ft T^)“^T m 

\ \ \ 


ft T^(T ft T^)”^ 


are all independent of k e uJ^ (Peters, Redner, and Decell; 1976) 
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CHARACTERIZATIONS OF LINEAR SUFFICIENT STATISTICS 


by 

B. Charles Peters, Jr.,^ Richard Redner,^ 
and Henry P. Decell, Jr.^ 

We develop necessary and sufficient conditions that a surjective 
bounded linear operator T from a Banach space X to a Banach 
space Y be a sufficient statistic for a dominated family of 
probability measures defined on the Borel sets of X . We give 
applications of these results that characterize linear sufficient 
statistics for families of the exponential type, including as 
special cases the Wishart and multivariate normal distributions. 
The latter result is used to establish precisely which procedures 
for sampling from a normal population have the property that the 
sample mean is a sufficient statistic. 


^Author was partially supported by NASA/JSC Contract NAS-9-15000 
with the University of Houston during the preparation of this 
work . 


1. Introduction : Let T be a surjective measureable trans- 

formation from the measureable space' (X,A) to the measureable 
» 

space (Y,B) , and let V be a set of totally finite measures on 
A Following Halmos and Savage [2], we say that T is a 
sufficient statistic relative to V if for each E e A there 
exists a measureable function P(Ej*) : (Y,B) ■* R (the real numbers) 
such that for each F e B, p e V 

p(EnT"\F))= f P(Ejy)dpT“\y) . 

*'f 

In another nonequivalent definitiop of a sufficient statistic given 
by Lehmann and Schef fe ' [ 3] , B is always taken to be B^ , the 
largest a-field on Y consistent with the measureability of T 
Bahadur ( 1] discusses the relationship between these two definitions 
at length. 

In this paper our particular concern is that of developing 
necessary and sufficient conditions that a surjective bounded 
linear operator T from a Banach space X to a Banach space Y 
be a sufficient statistic, where A and B are the respective 
Borel fields of X and Y . Our first theorem shows that under 
a very natural -ondition the aforementioned definitions of 
sufficiency are equivalent. Specifically, the condition is that 
ker T = {x E X[Tx = 0} be complemented in X ; that is, for some 
closed subspace S of X , X = ker T ® S . (For example, if X 
is a Hilbert space, take S = (ker T)"**,) As a corollary we obtain 
a simple characterization of sufficient linear statistics for 


dominated sets of measures. In Theorem 2. we replace the condition 
that ker <;T be complemented with conditions on .the density functions 
corresponding to a dominated set V . Finally, we give applications 
of -these results that characterize linear sufficient statistics 

r* 

for families of the exponential type, including as special cases 
the Wishart and multivariate normal distributions. The latter 
result is used to establish precisely which procedures for sampling 
from a normal population have the property that the sample mean is 
a sufficient statistic. This generalizes the classical result that 
the sample mean is sufficient for independent samples. The final 
result deals with the connection between linear sufficient statistics 
and the Gauss-Markov theorem. 

If W is a Banach space, B(W) will denote the Borel field 

generated by the open sets of W . The totally finite measures 

defined on B(W) will be denoted by M(W) . We will write p<<v 

for the relation of absolute continuity and dp/dv for the equiva- 

« 

lence class of Radon-Nikodym derivatives of y with respect to v . 
For the definitions of a dominated set of measures, equivalent sets 
of measures, and their connection with a~finite measures defined 
on B(W), we refer the reader to Halmos and Savage [2]. 

2. Principal Results : Our first theorem shows that if ker T is 
complemented in S then, the two definitions of sufficiency 
described in the introduction are equivalent. 

Theorem 1 : Let X and Y be Banach spaces, let A = B(X) and let 

T be a surjective bounded linear operator from X to Y such that 
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0 

Q 

D 

D 

B 

I 

i 

i 


ker T is complemented iu X . Then + B(Y) . 

* k 

Proof: Since T is Borel measureable, it suffices to show that 

• •• » 

. Let S be a closed subspace of X such that 

" _ 1 * 

X - ker T © S . If P e , then T (F) e B(X) and if T 

denotes the restriction of T to S , then 
1 —I 

T (F) = T (F)n S E B(X) . It follows that T (F) e B(S) , and 

A A A J 

since T is a topological isomorphism, P = TT~ (F) e 8(Y) j 

Henceforth, we will assume that X and Y are Banach spaces; 

A = B(X) , 8 = 8(Y) and T:(X,A) -+ (Y,B) is a surjective bounded 

♦ 

linear operator. According to [2, Lemma 7}, for a dominated 
collection of measures V C M(X) a measure X , equivalent to 
V , can be defined by 

CO 

XCE) H I a.y.(E) 
i=l ^ ^ 

where is a countable subset of V which is equivalent 

i=l 

to V and E a.y.(X) < “ . Obviously, if P is homogeneous, we 
i=l ^ ^ 

can take X e P . Combining the results of Theorem 1 with those 
of Lemma 2 and Theorem 1 of [2], we have: 


Theorem 2 : If ker T is complemented in X , then T is sufficient 

for P if and only if for each e P there exists a real valued 
function on Y such that g^o ^ dy/dX . 


i 

I 


Proof : ByTheorem 1 of [2], T is sufficient if and only if for 

each y e P there exists a real valued Borel measureable function 
on Y such that g^o T ^ dy/dX . Since ker T is complemented 
in X , B(Y) = S,p and each real valued function g^ such that 


“ 3 — 






T is Borel measureable on X must be Borel measureable on Y . 

« 

* * 

, .In all that follows 6g(x,z) will denote the Gateaux 

differential of the function g at x in the direction of z . 

I* ^ 

Corollary 1 : If ker T is complemented in X , then T is 

* 

sufficient for V if and only if for each y e P there exists 
f^ e dy/dX such that x e X and y e ker T implies 6f^(x;y) = 0 . 

Proof : If T is sufficient, then for each y e P there exists 
g^:Y -> R such that f^^ = g^O T e dy/dX . It follows immediately 
that 6f^(x;y) = 0 for each x e X, y e ker T . 

If f^ e dy/dX and 6f^(x;y) = 0 for y e P, x e X, y c ker T , 
then f (x+y) = f (x) for each x e X , ye ker T . For z e Y 
define gj,(z) = f (x) where z = Tx . Then g is well defined 

M 1-^ |J 

and f^ = SpOl* • Hence, T is sufficient. 

The next theorem concerns a replacement of the complemented 
kernel condition whenever there is a continuous Radon-Nikodym 
derivative f^ e dy/dX for each y e P . 

Theorem 3 : Let V C X be an open set such that X(X'\'V) = 0 and 

let X(U) > 0 for each nonempty open subset U of V . Suppose 

X(B+y) = 0 whenever B CV , X(B) = 0 and y e ker T . For 

each y £ P , let f^ e dy/dX be continuous on V . Then T is 

suf f icient . if and only if f (x) * f (z) whenever x, z e V and 

y y ' ^ 

Tx = T2 . 

Proof : If T is a sufficient statistic, then there exists e dy/dX 

such that g (x) g (z) whenever x, z e V, Tx = Tz . Let y e- P 
and y e ker T be fixed. The set 
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U = {X £ vn(V-y)lf (X) f f <x+y)} 
is an open subset of V contained in B (B-y) where 
B = {x e V I f^(x) f g^(x)} . 

S 

Since i(B) = 0 , it follows from the hypothesis that X(U) = 0 

and hence, U = p . Thus fp(x) f^(x+y) whenever x, x+y e V . 

Conversely, suppose f (x) == for u e t? , x, z e 'V 

M’ 

whenever Tx = Tz . The function g^:T(V) R defined by 

g^(Tx) = fp(x) for X e V is well defined on T(V) . Since 

f is continuous on V , f ^ = g^o T on V , and T is an open 

mapping, it follows that is continuous on the open set T(V) . 

For y ^ T(V) define g (y) = 0 . Then g is Borel measure- 

M M 

able on Y and f = g o f • Thus T is sufficient for V . 

The proof of the following corollary is clear and will be 
omitted. 



Corollary 2 : if, in addition to the hypotheses of Theorem 4, the 

set V is convex, then T is sufficient for V if and only if 
5fp(x;y) = 0 for each pEt?,xeV,ye ker T . 

3. Exponential Families : Let X and Y be Banach spaces, 

(H,<»1*>) a Hilbert space and v a o-finite measure on B(X) 

such that v(X'^V) = 0 for some nonempty open convex set V C X 
for which v(U) > 0 for each nonempty open set U C V . Let 
'0 = j Y e r be a family of probability measures having 

exponential densities ~ c(Y)h(x) exp Q(y ) I t(x)> e dy^/dv 

where c(y) > 0, h(x) > 0 on V a.e.(v), t:X H is continuous 

-5- 
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and Gateaux differentiable on V , and Q:r H . 

Theorem 4 . Let T;X -»• Y be linear ,* bounded , sdrjective and 
v(B+y) = 0 whenever B e B(X), B C V, v(B) “ 0 and y e ker T . 

If - B E r , T is a sufficient statistic for the exponential 
family V if and only if <Q(y) - Q(3) ] <St (x;y ) > = 0 for each 
Y e r , X E X and y e ker T . 

Proof : Under the stated assumptions P is homogeneous and thus 

A may be taken to be an arbitrary element, say , of P . 

p 

Applying Corollary 2, t is sufficient for P if and only if 
figy g(x;y) = 0 for each Yer,xeV ye ker T , where 

gy p(x) = c(Y) exp { < Q(y) - Q(3)|tCx)>} . 
c(B) 

This is equivalent to <Q(v) - Q(3)| 5t(x;y)>= 0 for each 
YeT ,XEV,ye ker T . 

4. Applications . Let 5 denote the symmetric n n matrices, 
r the positive definite elements of S and P a family of 
Wishart probability measures with m ^ n degrees of freedom having 
densities 

fy(S) = exp {- I tr (y"^S)} . 

Theorem 5 . If 3 e r and T;S -j- range (T) is linear, then T 
is a sufficient statistic for the Wishart family P if and only 
if tr [(y~ ♦’B” )K] = 0 for each y e T and K e ker T . 

Proof. The preliminary conditions of Theorem 4. are satisfied with 
y ~ Lebesgue measure on S and the obvious identifications of c(y) 
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and h(S) . Let H equal S with <A|B> = tr(AB) , t(S) = S 
♦ ^ 1 

and Q(y)‘= -y /2 . Observe that *6t(S;F) = F. and apply Theorem 4. 

Remark: Theorem 5. implies that there is a nontrivial linear 

sufficient statistic if and only if there exists a linear mani- 
fold Af ^ S such that y”^ e At for each y e r • • 

We will now apply these results to normal families of • 
probability measures. In Theorem 6. we will state set theoretical, 
algebraic and geometrical conditions, each equivalent to the 
condition that T be a linear sufficient statistic for a family 
“0 ~ » Y E r of normal n-variate probability measures having 

densities, with respect to Lebesgue measure on , 

p^(x) = exp [-| (x-n^ ) (x-n.^, ) ] 

We will assume that for some 3 e r , Ho ~ ® a-nd = I . 

p p 

This requirement imposes no loss of generality since for any 
3 E r there exists a non singular matrix M„ for which 

P 

MpSipM' = I and a change of coordinate system defined by the 

transformation x + M_(x-n„) allows one to recover the sufficient 

P P 

statistic in the original coordinate system. 

Theorem G . If T;R^ is a linear transformation of rank k 

and i) - » Y e r is an arbitrary family of n-variate normal 

probability measures such that for some 3 e F , Ho ” ® ^•nd 

P 

flg = I then the following conditions are equivalent: 
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(1) T is sufficient for V - {P^} , y e r. 

(2) ker ^C. [ker(fi^-I)n 

(3) For each y e r , 

(a) 

(b) T‘^T(n^-I) = 

where the notation (•) denotes the generalized inverse of (•) - 

Proof: To see that (1) (2) observe that the preliminary 

conditions of Theorem 4. are satisfied with v = Lebesgue measure 

on X = . Make the obvious identifications for c(y) and 

h(x) . Let M denote the n x n real matrices and define 
n 

Q:r -»• H = M^x X M , t:X H and <•}•> on H , respectively, 
_l _i -1 

by Q(y) = ^y’ " " (XX', X, I) 

and <(Aj,Wj,B^) |(A 2 .W 2 ,B 2 )> = trCA^Ag) + w£w 2 + tr(B£B 2 ) . 

Since Q, t and <• 1 •> satisfy the remaining hypotheses of 
Theorem 4. and 5t(x,z) = (xz' + z'x, z,8 ) for each x, z e , 
it follows that for each y e f > 

ker T C {yeR“:x'(Sl“^-I)y - y'n" = 0 , x e R*'} 

= ker n = ker(fl^-I)0 

To see that (2) + (3) note that T^T is the orthogonal 
projection on range (T') = (ker T)“** . Since n^e(ker T)"^ , 

(3a) holds. Furthermore, ker T^T = ker T a ker implies 

range (fl^-I) c range (T^T) and hence that T^T (n^-I) - (n^-I) 
which is (3b) . 

In order to see that (3) (1) recall the definition of 

Q(y) > t(x) and the fact that 6t(x;z) = (xz'+z'x, z, 6 ) . 


V/e need cftily show that - o'y = 0 for. each y e r , 

x*E X and y e ker T . Using (3b) and symmetry together with 
(3a) it follows that 

x'(«^-I)y - n^y = x'(ft^-I)T'^(Ty) - n^T‘^(Ty) = 0 . 

We state the following corollary without proof. 

Corollary 3 . Under the hypotheses of Theorem 6. , there exists 
a k X n rank k sufficient statistic for > Y e f if 

and only if there exists a rank k orthogonal projection P on r” 

9 

such tl.at (a) Pn^ = and (b) P(fl^-I) = for each y e r 

Moreover, any k x n rank k matrix such that T^T = P is a 
sufficient statistic for * Y e r* 

Corollary 4 . If r = {0, !,•••, m-1} , Hq = 6 , - I and 

B = [ njl Hg! * * * Uu,_il ^ linear 

sufficient statistic for the finite family ^^y^ ’ Y ^ r of 
n-variate normal probability measures if and only if 
range (T") = range (B) . Moreover, k = rank B is the smallest 
integer for which there exists a k x n sufficient statistic for 
{Py> , Y e r. 

Proof: The equivalent condition is an immediate consequence of 

Theorem 6. The minimality statement follows from the fact that 

Hf* 

if T is a p X n rank p sufficient statistic then T TB = B , 
hence, T^TBB^ = BB^ . It follows that range (BB^) c range (T^T) 
and, since (BB^)B ** B, satisfies Theorem 6.(3) so that k 
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I Example 1 . Let ^2****’ ^n'*** ^ sequence of univariate 

N(p,a) variables such that the joint density af '^2 ’ ’ * ' '^n 
I i& where 5' = (1,1, •••,!) . Let {P^} , n e R 

be, the family of probability measures having densities 
and T 7 ^ 0 a 1 ^ n matrix. 


Observe that T is sufficient for {P } , p e R if and 

1/2 ^ 
only if TSl^ is sufficient for the family of probability ' 

- _1 /2 
measures , p e R having densities N(yR^ ^n’^^ and, 

according to Theorem 6. , that this is equivalent to the condition 
1/2 _l/2 

that ker T ft C [ft„ €„] , This is equivalent to = ®.^Tn„ 
n • n n n n n 

for some scalar . A simple calculation shows that 

that the statistic T is sufficient for {P } , 

p e R if and only if T *= ^n^n ‘ particular, 

A _i _i 

note that T = T = (t-ft £ ) £'ft is sufficient for {P„ }, p e R and th: 

'‘^n n ^n*^ n M 

A 

T (x^,'**,x^)' is an unbiased estimate of p for each integer n. 

This generalizes the classical result that the sample mean is a 

% 

sufficient statistic for p when tlie samples are 

independent . 


Further note that if T = C'/n (the statistic T for the 

n 

sample mean) is a sufficient statistic for R 

for each integer n , the column sums (row sums) of ft^ are 
identically ^ routine induction argument shows 

that, in the latter case, Cov (x.,x.) - constant for i, j.:l,2,»*-, 
i 7 ^ j. 




Example 2 . Let y = WY + e , where W is a fixed m x n matrix 


of rank n and c '\» N (G,I). According to the Gauss-Markov theorem, 
the minimum variance unbiased ixnoar estimate of y is y = (W'W) \i'y 



-1 

Let T = (W'W) W' and observe that for y e , 

T"(TT')"-?T Wy = Wy and, since T"(T.T-)“S’=T‘^T, . Theorem G. implies T Ik ii 
auiiicient statistic for the set of probability measures {P^} , 

Y* e r” having densities N(.Wy,I) . 

A 

On the other hand, if T is a sufficient linear statistic 
for {P^} , Y e R” such that Ty is an unbiased estimate of y 

A A 

then, since Tff =* I , T has rank n . Corollary 4. implies that 
n is the smallest integer for which there exists a linear n x m 
sufficient statistic for {P } , y e R” . Moreover, T = 

A * 

for some nonsingular n x n matrix B . Since TW = I , 

T ^ (W'W)-\' . 


Since Y = Ty , the Gauss-Markov estimate of y may be 
characterized as the unique linear sufficient statistic T for 


{P } 

Y 




Y e R for which Ty is an unbiased estimate of y . 


f 
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INTRODUCTION 


In this paper will denote an n-variate normal population 

having a priori probability tT;.> 0 and density p^.(x); i=0,l m. 

Using recent results tl] that characterize linear sufficient statistics 
we will develop an explicit expression for a kxn compression (k<n) 
matrix T for which, using the Bayes classification procedure 12] , 
in which costs of misclassification are tacitly assumed equal on all 
classes, X is assigned to if and only if TX is assigned to 11^ . We 
will further demonstrate that k is the smallest integer (<n) for 
which the latter equivalence is valid and that T can be directly 
calculated in terms of the known population means and covariance matrices. 

The applications which motivate the necessity for compressing or 
reducing the size of a data vector is summarized very well in a review 
paper by Laveen Kaval in [3]. Our own interest was motivated by a 
need to reduce computational requirements in a large area crop inven- 
tory project using multidimensional data taken remotely by near earth 
satellites [4], 

In all that follows q. and will, respectively, denote the 
mean and covariance matrix of population , i=0,l,...,m. It is well 
known that for each non-singular nxn matrix A and nxl vector a, the 
Bayes assignment of x to is equivalent to the Bayes assignment of 
A(x-a) to We will later assume that ^^^=0 and = I. This assump- 
tion Will impose no loss of generality in the results that follow since 
v/e may set and choose A such that AE^A^=I. 

If the latter transformation of variables is necessary, we will not 
introduce new symbols for the variate ACX-n^}, the densities p^CAx-Oq) 



and their associated means and covariance matrices. Whenever Q is 
an sxn rank (s_<n) matrix, we will denote the s-variate normal 
density of Qx by (for population n. ) p^(Qx). 

PRINCIPAL RESULTS 


According to [1], let k(£n) be the smallest integer for which 
there exists a linear sufficient statistic (kxn matrix T) for the family 
of probability measures having densities p^(x); i=0,l, ...» m. The 
results in [1] demonstrate that the sufficiency of T is equivalent 



and let M=FG be a full rank decomposition [5] of M, that isi F is nxk, 

G is kx(m+l)m and rank (F) = rank (6) = k. Again, according to[l] and 
the latter, k must be precisely the smallest integer (<n) for which 
a kxn matrix T can be a sufficient statistic for the given family 


of probability measures. 

It is well known [5] that M^=G^F^ and hence that MM’*^=FF^. A 
simple computation reveals that T=F^ satisfies conditions (1) and (2) 
so that F^ is a sufficient statistic (of minimum left dimension) for 
the given family of probability measures. We have the following 


theorem. 


I 


Theorem 1 . Let be an n-varlate normal population with a 
priori probability Tt^>0, mean and covariance 2 :^; i=0,l,*>»,m 
(withTip=0» Ep=I) and let FG=M=[n^ jn2! ••• * ** I 

be a full rank {“k^n) decomposition of M. Then, the n-variate 
Bayes procedure assigns x to n. if and only if the k-variate Bayes pro- 
cedure assigns F^x to Moreover, k is the smallest integer for 
which there exists a kxn compression matrix T preserving the Bayes 
assignment of x and Tx to ; i=0, 1, m 

Proof: Recall that the n-variate Bayes procedure assigns x to 

TT. if and only if ir .p .(x)>ir.p. (x) ; i-0,l,...,m: i?* j (with arbitrary 

J J J * ^ 

assignment of x to any of the populations Ilktor which iTjPj(x) = 5* 


Let R be any (n-k) x n matrix such that C = R(I-FF^) has rank 
n-k and note that Tr.p.(x) > Tr.p.(x); i=0,l,...,m: i?5j is equivalent 

J J * X 

pT 

For any q=0,l,...,m, the n-variate normal density Pn([„ 1^) has mean 
nPJ and covariance matrix: 

^ 0 r 

^ ^ rT. r rT. rT 


to 


F'V 

CSqF 


F'ZqC 

CE,C’ 


Condition (1) implies Cn = 0 . Condition (2) implies that I-FF commutes 

T T 

with and it follows that CSqC -CC and Cs^F = 0 . We may therefore 

pT 

writ« Pq^^C product of the respective k-variate and (n-k)- 

variate densities Pq(F^x) and Pq(CxjF^x), the conditional density of Cx 
given F^x. Since Pq(Cx[F'^x)>0 does not depend upon q = 0, 1, ...» m; 
it follows that the n-variate Bayes assignment )f x to llj; j=0,l,.,,, m, 
implies the k-variate Bayes assignment f"'^x to JTj. The foregoing arguments 
are reversible and hence the k-variate Bayes assignment of F^s to IIj 
implies the n-variate Bayes assignment of x to IIj, completing the proof of 
the equivalence. The minimality of k, in the sense that the n-variate 







and k-varlate Bayes assignments of x and F^x are preserved, is a con- 
sequence of the developments preceding the theorem. 

CONCLUDING REMARKS 

Clearly the theorem is valid if there is at least one population 
with mean 9 and covariance I, in which case we would label that 
population 11^^. If this is not the case, one would choose some 
population, say ir , and perform the change of variables x^A(x-ri ) 

M H 

T 

V'^here AtqA =I prior to application of the theorem. The appropriate 
statistic for compression, in terms of the original variates, would 
then be T=F^A”*^ . 

These results completely characterize the nature of data 
compression for the Bayes classification procedure in the sense 
that k is the smallest allowable data compression dimension consis- 
tent with preserving Bayes population assignment and, moreover, the 
theorem provides an explicit expression for the compression matrix T 
that depends only upon the known population means and covariances. 

The statistic T=F^ given by the theorem is by no means unique {e.g., 

T 

for any non singular kxk matrix B, TeBF will doi It is also true 
that there may be more efficient methods for calculating the 
statistic T (yet to be determined) than the method of full rank 
decomposition of M. 

It should be noted that the matrix M has an "excellent chance" 
of having rank equal to n. Even in the case of two populat-*ons (m=2), 

there may well be n linearly independent columns among the 2{n+l) columns 
of M and, therefore, no integer k<n and kxn rank k compression matrix T 
preserving the Bayes assignment of x and Tx. 



There has been extensive work [6], [7], [8], [9] » [103, [ll], [12], [13], 
on determination of compression matrices (of a given rank) based upon 
criteria that, generally, attempt to describe the relative (to the 
variate x) "information content" in the variate Tx (e.g., divergence, 
Bhattacharyya distance, Chernoff bound, principal components, Wilks 
scatter, etc.) While these criteria provide bases for calculating 
compression matrices T, they provide little or no means for determining 
the degradation in probability of mi sclassifi cation or sensitivity to 
population assignments. 

In sampling situation one may choose to replace the columns of the 
matrix M by their estimates, that is nj by xj and Zj by Sj. The matrix 
defined by the estimate suggest a compression technique based on the selec- 
tion of a k dimensional hyperplane which in some sense best fits the 
range space of matrix 



where 

x^=0 and S^=I. 


We feel that the results in this paper shed some light upon the 


subject. In future work we intend to extend these results and the results 
of [1] to a related concept of an "almost sufficient" statistic. 
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1. In troduction 

Systsns of nonlinear equations can seldom be solved exactly. Usually, 
one must obtain approximations to the solutions of such systems by iteration. 
Quasi-Newton methods (also known as variable metric, variance, secant, update, 
or modification methods) constitute a class of iterative procedures which may 
be regarded as generalizations of the secant method for solving a single 
equation in one unknown. Indeed, not only is the quasi-Newton equation (the 
equation characteristically satisfied by the iterates produced by these methods) 
a direct extension of the equation iflrttich defines the iterates of the secant 
method, but also these procedures share many of the computational advantages 
of the secant method over Newton’s method. 

Quasi-Newton methods were first introduced in the papers of Davidon [2], 
Fletcher and Powell [^1], and Broyden Cll. In spite of their recent origins, 
these methods have proved themselves in dealing with practical problems and 
have become the subject of a large amount of research. The paper of Dennis 
and Mere '[ 3 ] provides both an excellent in-depth survey and an elegant unified 
development of quasi-Newton methods and their theory as understood in the mld- 
1970' s. The main body of this note is a rearrangement and condensation of 
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material in C33- 

In the following, we first formilate precisely the problem to be solved 
and motivate the introduction of quasi-Newton methods by considering the 
classical Newton and secant methods and their properties. We then survey 
three highly successful quasi-Newton methods: Broyden's method for the 

solution of general nonlinear equations, and the Davidon-Pletcher-Powell 
and Broyden-Pletcher-Goldfarb-Shanno procedures for unconstrained minimization. 
(The last two methods will henceforth be referred to as the DFP and BPGS methods, 
respectively.) Finally, we compare the- properties of these methods to those of 
Newton's method and UHMLE in potential applications to maximum-likelihood esti- 
mation of parameters in mixture distributions. 

2. The problem 

We consider the problem of solving F(x) = 0 iri an open convex subset 
D of under the following assumptions on the mapping F:D : 

(a) P is continuously differentiable on D. 

(b) There is an x* in D such that F(x*) = 0 and 
F' (x^) is nonsin©.ilar. 

Newton's method for iteratively approximating the solution x* begins with 
an initial approximation Xq to x* and attenpts to obtain improved approxi- 
mations by the iteration 

^k-t-l ~ ^k ~ ^ ~ • • • • 

The convergence properties of Newton’s method which are important Iriere are 
sunmarized in the following theorem. 


TJieorem; 



Whenevef Xq is sufficiently near there is a sequence 

of non-negative numbers which converges to zero and for which 

3 • ♦ • 


( 1 ) 


'Vi - ^ - >'*1 


k = 0,1, - . . 


If, in addition to satisfying assunptions (a) and (b) above, F has a derivative 
vifliich is lapschltz continuous at x* , 1 . e . , there exists a ic for which 
!F’(x) “ F’(x*)| 5 icix - x^i for all x sufficiently near x*, then there 
exists a constant g such that 


( 2 ) 


1=^+1 -=*1 ^ S|x^-x*l' 


k = 0,1, . . . 


whenevei* Xq is sufficiently near x*. 

A sequence which satisfies an inequality of the form (1) with a sequence 
^\^k=0 1 which converges to zero is said to converge super linearly . If 
a sequence satisfies an inequality of the form (2), then it is said to converge 
quadratlcally . Super linear convergence is fast; quadratic convergence is very 
fast. Since Idpschitz continuity is a very weak assunption, one mi^t say that 
the theorem asserts that the convergence exhibited by the Newton Iterates is 
always fast and almost always very fast. 

Ihe rapid convergence of the Newton iterates is the major advantage of 
Newton's method. Another advantage is that Newton's method is "self-corrective" 
in the sense that depends only on F and x^ so that bad effects of 

previous iterations are not carried along. (Quasi-Newton methods are not self- 
corrective in this sense.) Balanced against these advantages is the fact that 
Newton's method often requires a great deal of conputation at each iteration. 
Indeed, the determination of each iterate requires 0(n ) function evaluations 
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O 

and 0(n ) arithmetic operations. Thus one is led to ask whether there 
are methods viMch retain fast co" '' cirgence while requiring fewer function 
evaluations and arithmetic operations at each iteration. 

With this question in mind, consider the secant method in the case 
n = 1. This method begins with an initial approxination Xq to x* and 
defines successive approximations by the iteration 


X. - 


X, 


'k+1 \ 


^-1 


H\) - 


P(Xj^) . 


One may regard the secant method as being obtained from Newton’s method by 
replacing the derivative P'(Xj^) by a finite-difference approximation. A 
particular consequence is that the number of function evaluations per iteration 
is reduced from two for Newton’s method to one for the secant method while the 
number of arithmetic operations per iteration is not significantly increase!. 

It can be proved that, for Xq sufficiently near x*, the iterates produced 
by the secant method exhibit superllnear convergence rather than quadratic 
convergence as in the case of the Newton iterates. Nevertheless, siperlinear 
convergence is still fast, and experience has shown that, as a general-purpose 
algorithm, the secant method is more efficient in total confutation time than 
Newton's method. This suggests that generalizations of the secant method to 
higher dimensions might be similarly successful. 


3. Quasi-Newton methods 

Quasi-Newton methods are generalizations of the secant method which are 
applicable to problems of the type at hand involving an aii)itrary number of 
independent variables. The key properties of these methods are that the 


iterates exhibit superlinear local convergence and that each iteration 

2 

requires n function evaluations and 0(n ) arithmetic operations. In 
spite of the fact that quasi-Newton methods do not have the quadratic conver- 
gence property of Newton's method 3 the conparatively snail number of function 
evaluations and arithmetic operations make them preferable to Newton's method 
in many applications. 

Quasi-Newton methods have the general form 



where satisfies the quasi-Newton equation 

(3) = P(x^) - , 

Note that has the action of a finite-difference approximation to 
P'(Xj^_^) in the direction (x^^ - Tiius quasi-Newton methods in general 

bear the same relation to Newton's method as the secant method in the case 
n = 1 . 

It is clear that the secant method is a quasi-Newton method. In fact, 

if n - 1, then the quasi-Newton equation determines the scalar exactly, 

and so the secant method is the only quasi-Newton method in this case. If 

n > 1, then the quasi-Newton equation alone does not determine uniquely; 

hence, there is no unique natural extension of the secant method to the case 
of an arbitrary number of independent variables. This lack of uniqueness in 
the general case may be regarded as an advantage, for it allows a variety of 
quasi-Newton algorithms which may be drawn upon to take advantage of any 
special structure viiich may be present in specific problems of interest. 


I ,: III ...■,■■■1 .f;:.-..- .. .. i . 




Iten n > 1, one must inpose relations between successive matrices 
and their predecessors which, together with the quasi-Newton equation, 
uniquely determine these matrices inductively. In general, those relations 
are chosen with an eye toward minim izing the conputational conplexity of the 
resulting update fontiula for determning from Bj^, Xj^, and F while 

taking maximal advantage of whatever special structure may be shared by the 
particular problems under consideration. Of the three quasi-Newton methods 
presented below, the first (Bi'oyden’s method) is intended to be a general 
purpose algorithm which can be applied to all problems without regard to 
special structure. Consequently, in Broyden’s method, Bj^^^ is obtained by 
adding a rank-one ’’correction term" to in such a way that the quasi- 
Newton equation is satisfied and agrees with B^^ on the orthogonal 

complement of ^ ^ sense, this may be regarded as the "sinplest" 

way to obtain from in such a way that the quasi-Newton equation is 

satisfied. On the other hand, the second two methods (the DPP and BFGS methods) 
are designed for unconstrained minimization problems, in viiich the Jacobian 
P‘(x) can be expected to be symmetric and positive-definite. Thus the \pdate 
foimilas for tliese methods are such that the successive "inherit" 

syninetry and positive-definiteness fran the preceding ones. Not surprisingly, 
these foimilas are more complex than the update formula of Broyden's method. 

In fact, in order to guarantee hereditary symnetry and positive-definiteness, 
it is necessary In these formulas to determine from with a 

correction term of rank two. 
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i|. Broyden’s method for general nonlinear equations 

Broyden's method Is, in a sense, the "sljTplest" of the most popular 
quasl-Nevfton methods and is intended to be a general-purpose algorithm for 
solving arbitrary nonlinear equations, lb derive the formula used in Broyden's 
method to update the matrices B^, suppose that, for some k > 0, one has 
arrived at and Then can be generated by the formula 

= ’Sc - ■ 

Our objective is to use x^, B^^ and F to update in the 

"simplest" way to obtain a matrix B^^^ which satisfies the quasi-Newton 
equation. 

For convenience, we adopt the following notation: 



Note that B and B differ by a rank-one operator. Restoring subscripts, 
we obtain the iteration formulas for Broyden's method; 
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\+l = \ 


T 


®k+l “ ®k + 


- Vk>=k 


where y^ = F(x^^j^) - F(x^) and a^ = - x^. 

Does Broyden's method exhibit the key properties attributed to quasi- 

Newton methods in the preceding section? It can be shown that if Xq and 

Bq are sufficiently near x* and F'(x*), respectively, then the Broyden 

iterates are well-defined and converge super linearly to x*. (The proof is 

rery involved, and we omit it.) Also, it is clear that, for a given value of 

k, the determination of and requires only the n function 

evaluations necessary to specify assuming that F(Xj^) can be 

provided frcmi storage. Finally, it is evident that, for a given k, x, , 

K+J. 

2 —1 

and Bj^^^ can be determined with 0(n ) arithmetic operations if 

2 

can be evaluated vri.th 0(n ) arithmetic operations. 

—1 2 

There are two ways of evaluating B^^ ^ arithmetic 

operations, both of which require information about first way is 


based on the Sherman-Morrison formula [83 and produces B -*■ from with 

2 

0(n ) arithmetic operations in the following way; write 


B = B + = B + 

isr 


T 


uv 


T 


where u = (y - Bs), v = 


(2 ^ 


thien 


B"^ = B ^ 


1 + <v,B”\i> 


n-i T„-l 
B uv B 
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The second way is based on a special factorization procedure due to Gill 
•and Murray C53 vtfiich begins with a factorization B = QR and yields a 
factorization B = Q R with 0(n ) arithmetic operations. (Here, Q and 
Q are orthogonal and R and R are upper-triangular.) Since an n-dimensional 

linear system whose coefficient natrix is factored in this way can be solved with 

2 —1 
0(n ) arithmetic operations, this allows the evaluation of the terms 

p 

with 0(n ) arithmetic operations as desired. For reasons of numerical stability, 
the Gill-Murray factorization procedure is genersilly preferable to the method 
using the Sherman-Morrison formula. 

5. The DPP and BPGS methods for unconstrained minimization 

For the purposes of this note, the basic problem of unconstrained minimization 

may be regarded as the problem of solving Vf(x) = 0 in an open convex subset D 

of where f is a nonlinear functional from D to R^. Clearly, this 

problem is of the type introduced in Section 2, with Vf playing the role of F. 

The special feature of this problem is that the Jacobian of the function whose 

2 

zero is being sought is actually the Hessian V f , a. matx'Lx which is certainly 

2 . 

symmetric. In fact, in most problems of practical interest, V f is positive- 
definite near the minimum of f . 

It seems reasonable to require that the matrices Bj^ appearing in a quasi- 
Newton method applied to an unconstrained minimization problem be symmetric and 
positive-definite. Since each is to be determined from its predecessor 
by an uptote formula, it is reasonable to impose conditions on the update formula 
vdiich guarantee that synmetry and positive-definiteness are inherited by the 
successive matrices Unfortunately, imposing hereditary symmetry as well as 

the quasi-Newton equation completely determines a rank-one update formula, and 
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this fomula does not guarantee hereditary positive-definiteness. Consequently, 
one is led to look for rank-two update foimilas which insure that the successive 
matrices inherit symmetry and positive-definiteness. 

A general rank-two update formula which guarantees hereditary symmetry 
is the following: 

B = B + (.V - 63)0*^ + c(y - Bs)*^ _ <y - Bs,s> ^ 

<c,s> <c,s>^ 


where c is any vector in such that <c,s> r 0. A "natural" choice of 

c which insures hereditary positive-definiteness whenever <y,s> > 0 is 

2 

c = y. (Since <y,s> ~ <7 f(x*)s,s> near x*, one expects <y,s> to be 
positive near x“. ) The resulting update formula is that used in the 
Davidon-Pletcher- Powell (DFP) method. Denoting by the updated matrix 

obtained from B by applying this fomtula, one has 


Bs)y‘^ + y(y - Bs)*^ 


<y - Bs,s>yy 
2 

<y,s> 


= (I - 


<y,S>' 



<y,s> 


As with Broyden's method, one can show that the DPP iterates converge 

superlinearly to x* whenever x^ and Bq are sufficiently near x* and 

2 

V f(x*), respectively, and that each iteration requires n function 
evaluations and 0 ( 0 “^) arithmetic operations. Although the DPP update 
formula is a bit more complicated than the Broyden update formula, experience 
has shown that the DFP method is generally superior to Broyden 's method for 
problems in unconstrained minimisation. 
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At the iteration, both Broyden’s method and the DFF method 

require first the determination of then the updating of 

It is natural to ask whether a more efficient method might be obtained by 

-1 -1 

applying an update formula directly to . If we denote B by H 

and B"^ by H, the quasi-Newton equation Bs = y becomes s = Hy. 

Carrying out a development completely analogous to that leading to the DFP 

update formula yields the update formula of the Broyden-Fletcher-Shanno- 

Goldfarb (BPGS) method. Denoting by the updated matrix obtained from 

H by applying this formula, one has 

T T T 

u = (T - sy ys N . ss 

"BPGS ^ <y,s>- ^ " <y,s>^ <y,s> * 

It is not difficult to see that, as in the case of the DFP update, this 

update adds a rank-two correction term to H and guarantees her’editary symmetry 

and, if <y,s> > 0, positive-definiteness. Again, it can be shown that the 

BPGS iterates converge superlinearly to x* wherever Xq and Hq are 

sufficiently near x* and V f(x*) , respectively. It is clear that each 

2 

iteration requires n function evaluations and 0(n ) arithmetic operations. 
The BPGS method is not the same as the DFP method. In fact, 

^BPGS "" ^^PP^ ^ 

where v = <y,Hy>^'^^[ — ^ • According to [33j there is "growing 

<s,y> <y,Py> 

evidence that BPGS is the best current update foimula for use in unconstrained 





minimization" . 
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6. A potential application 

We conclude this note by comparing the properties of quasi-Newton methods 
to those of Newton’s method and UHMLE in a potential application to the 
problem of obtaining maximum-likelihood estimates of the parameters In mixture 
distributions. Such estimates, of course, play a fundamental role in certain 
approaches to signature extension, estimation of proportions, and clustering. 
For a description of the UHMLE algorithm, see C6] and [ 7 ]. 

Let X be an n-dimensional random variable with probability density 
function 



where 

0 n 

, , 1 -l/2(x-liV) E? (x-uj) 

and the proportions a? are positive and sum to 1, Suppose that 

is a sanple of independent observations on X. By a maximum-likelihood estimate 

of the parameters {a?, 2?}. , , we mean a choice of parameters 

1 1 jin 

_ which locally maximizes the log-likelihood function 

-L X X X~"X 3 * ♦ * ~ 


L = iJl log p(xj^) , 


regarded as a function of the parameters {a., p., E. }. , . It is known 

X X xxx^Btt* 

that, loosely speaking, there is a unique strongly-consistent maximum-likelihood 
estiTiate. (See C7D for a clarification and proof of this statement.) 

The problem which we consider here is to approximate numerically the 
strongly-consistent maximum-likelihood estimate. This is potentially a very 
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difficult problam. Indeed, the number of Independent variables is 
(m - 1) + mn + m , a number which may be very large. Purthemore, 

the evaluation of functions derived from the log-likelihood function usually 
involves surnnation over the entire sanple of N observations and, hence, is 
a source of conputational difficulty when the sample is large. In the table 
below, we list the key properties of UHMLE, Newton's method, and quasi- 
Newton methods vrinen applied to solving likelihood equations obtained by 
differentiating the log-likelihood function. It should be noted that, in 
addition to the arithmetic operations listed in the table, each method requires 
at each iteration the evaluation of the functions i = l,...,m, 

k = 1,...,N. 


METHOD 

CONVERGENCE 

ARITHMETIC OPERATIONS 
PER ITERATTON 

UHMLE 

Linear 

O(mn^N) 

Newton's Method 

Quadratic 

O^(m^n^N) + 02(m^n^) 

Quasi-Newton Methods 

Superlinear 

O^(mn^N) + 02(m^n^) 


Of course, many factors must be considered in addition to convergence 
rates and the amount of arithmetic per iteration when deciding what sort oi‘ 
algorithm is best suited in a particular instance for application to the 
problan under consideration. ?or exaitple, UHVlIfi is a type of gradient 
method; hence, one might expect UHMLE to enjoy the relatively good global 
convergence behavior usually associated with gradient methods. Furthermore, 
gradient methods are often competitive in speed of convergence to Newton's 
method and quasi-Newton methods when only "ball-park" approximations to the 




solution are desired. Since the nearness of the Traximum-likelihood estimate 
to the true parameters vd.ll be limited by the vai’iance of the sample obser- 
vatior^, "ball-park" approximations vdll certainly suffice except, perhaps, 
in the case of a very large sajiple. 

It is difficult to predict circumstances in which the advantage of fast 
convergence for Newton's method and quasi-Nevrt;on methods vri.ll outweigh the 
disadvantage of having to perform a great many arithmetic operations at each 
Iteration with these methods. However, it should be noted that if N is 
very large relative to m and n, then the number of arithmetic operations 
per iteration required by quasi-Newton methods is comparable to the number 
required by TTHMTF.. Also, if N is very large, one might rea. onably want 
to obtain very accurate approximations of the maximum- likelihood estimate, 
in which case the superlinear convergence of quasi-Nev/ton methods is clearly 
preferable to the linear convergence of UHMLE. Consequently, if N is very 
large relative to m and n and if particularly acciorate approximations of 
the maximum-likelihood estimate are desired, then quasi-Newton methods appear 
to have a clear-cut advantage over UHMLE. In such circumstances, one might 
retain the good global properties of UHMLE by snploying a hybrid method 
vhich Initially behaves like UHMLE and then behaves increasingly like a 
quasi-Newton method as the iteration proceeds. 
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R BPRODOCIBILrrT OP 


ON N-— ROOTS OF POSITIVE OPERATORS 
by D.R. Brown and M*J. O'Malley^ 


A bounded operator A on n Hilbert space H is positive 
provided < Ak,x ^ 0 for all x t: H. These operators are 
symmetric, and ns stich constitute a natural j>enernli::at ion of 
non-negative real diagonal matrices. The following result is 
thus both well known and not surprising: 

Theorem ; A positive oi)erator has a unique posiLive square root 
(under operator composition). 

This may be established by integration of the correct 
function, Invoking the spectral theorem for self-adjoint operators. 

A m^re accessible argument for those not acquainted with the mysteries 
of spectral measures may be found in [l,p.317]. 

While square roots and their iterates seem to provide a sufficient 
analytic tool for most purj>oses, it is also a (folk) theorem that 
positive operators possess unique positive roots for every 

positive integer n. As in the n = 2 case, existence follows from an 
application of the spectral theorem; however, we give an argument in the 
spirit of ri]. The purpose in so doing is not to exorcise the reader^ s 
knowledge of induction, hut rather to illustrate another use o.^' the Law of 
the Heap as a motivational instrument. 


1) Both authors received partial support under NASA contract NAS-9-15000. 
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Let I be the ideriLity operator on H, and let B(H) 

denote the set. of bounded operators on H. We will need 

the following properties of positive operators: 

« 

(1) the relation on positive operators defined by A i B 
If and only if B - A is positive, is reflexive, 
transitive, and consistent with the notation O' i A 
for any positive A; moreover, this relation is pre- 
served by operator addition and positive real scalar 
multiplication, and reversed by negative scalar 
multiplication* 

(2) If A and B are pt^sitive and if AB = BA, then AB is 
positive* 


(3) 

If 

0 

< 

A i I, then 0 :! I-A ^ I. 

(A) 

If 

0 

< 

A, then A i (lAl U* so that (l|All) ^A<I, if A ^^0 

(5) 

If 

0 


A :: I, then 1 A for all positive integers n. 


We also require: 

Lemma* If {S } is a sequence in B(H) such that 0 I i S ^ 
n n 

< 1, then there exists S t B(H) such that ^ 

all u c H* 

All of the conclusions above are verified by straightforward 
arguments in fl,pp* 317-320\ 

Theorem : Let A e B(H), 0 r A, and let k be a positive integer. 

Then there exists a unique positive operator B such that B = A, 
Proof : By (4) above, we need only consider tlie case in which A ^ I* 
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We first prove the existence of B* Since the theorem is a tautology 
for all operators when k = 1, we assume the existence of positive 

(k-l)“St roots for all positive operators. 

■ 

Under the momentary supposition that B exists, let 
R = I - A and S = I - B. Then (I - = I - R, so that 

(*) S = (1/k) fR + . 

Clearly the existence of a positive operator satisfying this 

implicit relation is necessary and sufficient to establisli the 

existence of the desired operator B. To this end, we define a 

sequence of operators hy S = 0, S = (1/k) [R + jL f (-1) 1 . 

o n+i r-4 n 

In order to .show S 1 S ,, It suffices to show, under the assumption 

n n+1 

0 i S , ^ S si, that 0 1 S , , - S = 
n-1 n n+1 n 

(1/k) C j,(r) (-D’^Cs'' - ,)J . 

r-/ n n-1 

To accomplish this, we digress to a considerati- n of the 
polynomial f(x) = ^^ 2 “ (l-x)'^ + kx - 1. Since 

Since f'(x) = k [1 - (1 - x)*^ a 0 on f.0,1], clearly f is 
increasing on this interval. To translate this to operators, it is 
necessary to examine the situation more carefully. By the Mean Value 
Theorem, given 0 5 y < 2 1, there exists a (unique) number c E (y,z) 

such that 

(**) f(z) - f(y) = f’(c)(z - y) . 

Upon solving, c = 1 - T (1/k) (1 - y)'^ ^(1 - z)^l 


l/(k-l) 


A* 


RL?tuniing to our oporator problem, we wish to apply tliis 

information l.o the sequence fS }. Since all members of this 

n 

family are polynomials in R = I - A, any two of them commute. 

This Is a property sufficient to permit imitation of equation (**) 

with operators; let z - S ^ y = S .. In this format, we use C 

n n- 1 

♦ 

to represent the operator 1 - J, where J ts (any) positive 

(k-i)st root of the operator (1/k) (T - S ’’ ^(I - S . 

' r»=o n-1 n 

The following chain of equalities Js easily calculated; 


S 


ri+1 


S 

n 


( 1 /k) -(fCs ) - f(s ,)) 

n n-1 


= (l/k){kn - (I - C)*^ 

= [!-(!- C)'""H*(S - S J 

n n-1 


= [I - j’' ‘(S - s ,) 

n n-1 

= [I - {(i/k) ’jij, (i-s )‘^“''“^(i-s )’^)j*(s - f ) 

r=L) n-J n n n-1 

By application of remarks (2), (3) and (5), the assumption of 

existence of (k-l)st roots, and the inductive hypothesis ^ 

the latter operator product exists and is positive* Hcncc ^ ^n+1’ 

and the sequence increasing* Of course, the Law of the Mean 

is not applicable in this setting, nor is it used other than to motivate 

the choice of C* Indeed, the discerning reader will note that the 

extremes of the chain above may be shown to be equal without the 

introduction of C* However, the rather unusual factorization of 

S . - S would be more difficult to discover without the example 
nH-1 n 






i 


furrv d by the dei'vr^.vc the real tunctlon situation. 

To Invoke the I ermnn and c omplete the proof of existence of 

kr^ roots, it remains to show I for all n. Assuming 

0 < S s I, we have kS ,, = R + = R - I + kS + (I - S 

™ nt+l T=l m m m 


ra 


By remark (5), (I - S ) 1 I ~ S ; therefore 

m 

R + kS - I + Cl-S)^ - *R+kS - I + I- S 
mm m m 

V I (k-l)S ^ kl. Hence 
m 

kS 5 kl and S S as desired, Tims, the Lemma gives an 

m+l Iln-i 

operator as in (*) , and I - S = B is a root of A. 

In order to prove the uniqueness of a positive k— root of A, 

t h 

we first observe that if T is any positive k — root of A, then T 

must perforce commute with A, hence with I - A “ R, hence with each 

S , and thus with S and 1 - S = B, Let u C H, v = (B“'T)u. 
n 

Then 0 = < (B*^-t’^)u,v -- = ■ (B-T)u.v '• = ,v >. 

r=J r=U 

Since iJ and T conimuto, 0 i ^ whence < ^ ^T^v,v>- 0, 


r “ 0, 1 , , • . , k“l. Let be any pos l tive (hence symmetric) square root 


of 

„k-r-l„r 
B T . 

Then ! fF vl T = < F 

r r 

2 

v,F v> = <F v,v> = 0, so that 
r r 

F V 
r 

= 0 and 

^k-r‘“l„r ^ 

B Tv=Fv=0. 

r 

Therefore B^ ^ ^T^(B^T)u = 0, 

or 


^k"r-2„r+l ^ , 

B T u , r = 0 , 1 , 

...k-l. In particular, for 


k k+1 

r = k-1, BT = T , Multiplying by T, we have B = BA = BT - T 

If k = 2, ttiG argument above shows Bv ~ 0 = Tv, whence 
I I (B-T)u = < (B’-T)^u,u ■- = < (B“T)v,li > ^ 0. Hence Bu ^ Tu for all 
u E H, and B is thus unique. Now assume all positive roots, of order 
less than k, for positive operators are unique. If k = 2 j , then 
(Bb^ = =8*^ = 1*^= whence = T-^ and thus B = T. If 

k Is odd , we havu shown above that ^ , so , by the even 


't'- 
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exponent argument, again B “ T. This completes the proof. 
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A FIXED POINT THEOREM FOR CERTAIN OPERATOR VALUED MAPS 


by D.R, Brown and M.J, O'Malley" 


1. Introduction . Let H he a real Hilbert space^ and let denote 

the space of symmetric, bounded operators on H which have numerical range 

in [0,1], topologized by the strong operator topology (that is, the topology 

of point-wise convergence). It is well known [3], that if T e then 

2 

there exists a unique S c such that S = T. We represent S by 

h 

T , The following theorem is due to John Neuberger [2], 

Theorem A : Suppose w E H, P is an orthogonal projection on H, and L is 

a (strongly) continuous function from H into B. (H) . Let Q - P, and set 

1 o 

Q , = Q"^L(Q"^w)Q n - 0,1,2,*., . Then {Q } converges to an element 

n+1 n n n n n=o ^ 

Q £ which z = is a fixed point of P and a fixed point of L 

in the sense that L(z)z = z. 

In this paper, under the same hypotheses as Tlieorem A, we develop a 

family of Neuberger-like results to find points z £ H satisfying L(z)z = z 

and P(z) - z. This family Includes Neuberger's theorem and has the additional 

property that "most" of the sequences {Q 1 converge to idempotent elements 

n 

of B^(K). The limit operator of Theorem A need not be idempotent* 

Such theorems as those above not only play a valuable role in the search 
for numerical solutions of partial differential equations, but are also useful, 
in the finite-dimensional case, in attacking the problem of determining the nonzero 

^Both authors received partial support under NASA contract NAS-9-15000. 
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In particular, if x C R^-{0}, then 


fixed points of a function 0:R^ 

X is a fixed point of 0 if and only if A(x)x x, where A is the matrix 

* 

“2 T 

valued function defined by A(x) = (Mxll ")* 0(x)‘ (x ) , In fact, it follows 
that this can occur if and only if A(x) is a nonzero symmetric idempotent* 

It is a pleasure to record our indebtedness to H.P. Decell for tlie remark 
immediately above, and to several other members of the University of Houston 
Mathematics Department, particularly Phillip Walker, for helpful conversations 
regarding the preparation of this paper. 


2 , Fixed Points of L(z) * Recall that an operator is positive if <Ax,x> ^ 0 

for ail X C H, where < , > is the inner product of H. We presume familiarity 

with the standard properties of positive operators as set forth, for example, 

in [3]. By invocation of the Spectral Theorem, or, alternately, by a sequential 

construction, it is possible to provide, for any T e (H) and any positive 

integer n, a unique operator ^ ^1^^^^ «uch that ^ This notion 

extends immediately to arbitrary positive rational powers of T by defining 
r / s 1 / s r 

T = (T ) . Moreover, by again appealing to tlie Spectral Theorem, it follows 

that if (Q.j is a sequence in B^(H) converging strongly to Q, and t is an 

t t 

arbitrary positive rational number, then 1 q^; converges strongly to Q . 
Finally, recall that die usual quasi-order defined for positive operators by 
A ^ B if and only if B - A is positive satisfies an additional anti-symmetry 
condition, to wit: if A and B are positive and commute, then A i B and 
B < A forces A = B. 


t^lODtJCIBILITY OF T® 
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Lemma 1 . Let Q € and let u be a positive rational number other 

ol 2 

than 1. If Q - Q, then Q = Q ; that is, Q Is an idempotent. 

■ 

Proof : Let a = r/s; the presumed equality is equivalent to Q = Q . Without 

loss of generality, assume r < s and that r is the minimal positive power 

of Q which reoccurs in the sequence From the fact thvit powers of an 

operator descend in the quasi“order mentioned above, together with the limited 

't r 

anti-symmetry of this relation, it follows that Q = Q for all integral t 

between r and s. From it follows that for all t 5 r. 

If r is odd, then By uniqueness of square 

roots, 0*^ - whence r = (r-fl)/2 and r = 1. If r is even, then 

= CQ^)^, whence r = r/2, which is impossible for positive r. 

9 

Thus r " 1 and Q = O'". 

We are now ready to prove our 

Theorem 2 . Let w e H, let P be an orthogonal projection on H, and 

let L:H be stronglv continuous. Let a,S be positive rational 

numbers with a e [^,^). Set Q = P, and let = q’ ''L(Q^ w)q^ n = 0,1,2,... . 

o n+i n n n 

Then iQ i _ ‘ is a decreasing sequence of elements of B_ (H) which converge 
n n-o i 

to an element Q £ B^(H) such that 

(1) if oi > hj then Q is idempotent and z = Qw satisfies 
L(z)z ~ z, and Pz = z, and 

(2) if cx = and B then z - Q^w satisfies L(z)z = z and 

Pz = z. 

Proof : Fix a 2- -i S '0. Si[un» = P c B^(H) and the range of L 




. 4. 


is in B, (H), it follows iiuiuctivcly that Q £ B, (H) for all n. Since 
1 n 1 

2a a 1, g Q^; moreover, <(Q^^ - “ Q^B(Q^w)q“)x,x> = 

■ 

<Q°(I - I*(Q^w)Q*^x,X'' = ' (I - L(Q^w) )Q^x,Q*^x>. Thus, since I - L(Q^w) a 0, it 
n n n n n n n 

follows that Q , , Hence we have 

n+1 ~ n 


(*) 


^n+1 “ ^n - ^n» " 


In particular, the sequence monotonlcally decreasing in the (operator) 

interval from 0 to I* Thus we have by [3, p-31S] that the sequence 

converges strongly to an element 0 e B (H) , whence fQ ^} converges to 

1 n 

and converges to Q^*'. Since L is continuous and operator multiplication 

is jointly continuous in the strong topology on B^(h), we have by uniqueness 
of limits that Q = Q^L(Q^w)Q^. Also, from (*) and the closed graph of the 


'a 


2a 


relation <, we have Q S Q ^ Q. Thus, since Q and Q commute, we 
2a 

have that Q = Q < Moreover, since P = Q , we have PQ == Q , whence 

o n n 

Y Y 

PQ = Q for all positive rational y. 

2 

(1) Suppose (X > h- By lemma 1, Q = Q*", from which it follows that 

Y 

Q = Q for all positive rational y, and, in particular, Q = Qh(Qw)Q. 

Let z - Qw, and fix x e H, Then <Qx,x>- <QL(z)0x,x> = <L(z)Qx,Qx>, 

2 

and since Q - Q, it follows that 0 = <(3x,Qx> ^ <L(z)Qx,Qx> = <(I - L(z))Qx,Qx>. 
Therefore, since I-L(z) and hence (I-L(z))‘ belong to B- (II) , we have that 


Q = L( 2 )Q* In particular, z = Ow = L(z)Qw = L(z)z. 

p. I5 

(li) Suppose a = B 1 -5- Let z - Q w; then Q = Q"L(z)Q from 


l < ^ I5 ^ >2 

which <Qx.,x> = <Q L(z)Q‘x,x> = <L(z)Q x,Q x>. Since ^Qx,x> = <Q x,Q x> also. 




we have 0 = <Q^x-L(z)Q 'x,Q 'x> = c (I-L(z) )Q x,0 Now, as in (i), it follows 


• 5. 


that = L(z)Q^. In particular, z - Q^w = =» L(z)Q^Q^ ^ = 

g 

L(z)Q w = L(z)z, That Pz ~ z In both cases Is obvious from the fact that 

Y Y • 

•PQ == Q for all positive rational y« This completes the proof. 

Given a nonzero element 2 c H such that L(z)z = z, it is reasonable 
to ask if our sequences are abl'e to produce z. We note now that, by proper 
selection of w and P, z is attainable from each of our sequences. 
Specifically, if a and 6 are fixed as in the theorem, then let w = z 
and let P be the orthogonal projection of H onto the line through z. 

From the construction of the sequence ~ PLCz)P, whence = P. 

If follows immediately that = P for all n and thus Q = P. Hence 

6 6 

z = Qw “ Pw (or z = Q w - P w - Pw) is the fixed point yielded by our theorem, 
V/hile it is not reasonable to expect the praticioner to guess P $o 
accurately, these remarks do attach the virtue of theoretical completeness to 
these processes. 


3* Examples * (1) Suppose that cx - ^ and that y, 6 e [%,”) such that 

neither of Yj ^ is an integral multiple of the other. We show that for fixed 
w e H and P, the Q and z obtained by using Y for 0 need not be the 
same as those obtained by using 6 for 6* Moreover, the limit operator 0 in 
this case need not be an idompotent, although it can be one. Assume 6 < y. 

Let k be the least positive integer such chat y < k6. Note 2 < k and 

(k-l)i5 < y. Let a be any number in the interval 

k6 ^ y * (k-l)fi ^ 6 

a <a <a <a. 


(0,1). Then 


6 . 


Define L:R— — >10,1] by 


X < 


L(x) = I (x-a"^) + 1 . J S x 5 


(k-l)6 , 
a X, 


Set P « 1, w " 1* Using Y 3 in the theorem yields = 1 and - a. 

Y Y Y 

Inductively, Q - a, so that Q - a. Hence z = Q w = a •! = a in this case* 
n 

2 k 

On the other hand, using 5 for & gives Q = 1, Q, = a, but Q = a = a . 

k k 6 k5 k6 

Moreover, = a for n I; k, hence = a and z = Qw=a *l-a * By 

the choices of a and k, the exponents Y ^nd 6 yield distinct operators 

and distinct fixed points. Moreover, neither of the limit operators determined 

by Y ^nd 5 is idempotent. 


(2) Suppose that a > if, so that any limiting 0 obtained through the 

tlieorem is idempotent. We show for fixed w l H and \\ that the resulting 

limit idempotents may vary with the choice of 6, ns may the fixed points 

determined in this manner. To this end, let (i 1 in the theorem. Let 
3 3 

L:R (R ) be as follows: all image matrices are diagonal, where/x 0 o\ will 

0 y 0 


0 0 z 


be represented as diag(x,y,z). We require L(i,i>l) = cllng(l,i^,l) , 


L(l,^^,l) = diagdjii* JO , L(lJ^;d) = diag , D , 1.(1, y,z) = diag(I,y,z) for 

(y,z) c [0,41 [0»4], and L(x,y,l) = diag(x,y,l) for (x,y) e [0,4] ^ l04]- 

The extension theorem of Tiotze (c.f, [1]) permits a continuous; extension of 

L to all of into the diagonal matrices whose entries are in the interval 

[0,1]* Let P = the identity operator, and let w be the vector (1,1,1). 

If B = 4 j cl brief examination of the defining sequence of Q ^ s in Theorem 2 

n 


7. 




: ..■•v-jrJii.-si.-a.-v.' ts- 


J 

i 


shows that the limit idempotent Q * diag(l,0,0), and z = Qw = (1,0,0). On 
the other hand, if 8=1, then limit Q = diag(0,0,l), and z = (0,0,1). 

(3) With notation as in (2), suppose 8=1 is fixed. We show for 
fixed w £ H and P, that the resulting limit irlempotonts may vary with a, 
as may the fixed points determined in this manner. Letting P = and 

w = (1,1,1) as in (2), we require this time that L(l,l,l) = L(l,^,l) = 
diag(l,! 5 ,l), L(l, 1/8,1) = L(1,0,0) = dlag(l,0,0), . and L(l, 1/32,1) = L(0.0,1) = 

3 

diag(0,0,l). Extending as before, we have a continuous L defined on R into 
the diagonal matrices with entries in [0,1]. For any choice of a, 

= diag(l,is,l). If a = 1, = diagd, 1/8,1), Q 3 = ^ " diag(l.O.O), 

z = (1,0,0), On the other hand, if a = 2, then Q, = diag(l,l/32,l) , = 

Q = diag(0,0,l), z = (0,0,1). 

It is easy to see that a slightly more complicated definition of L would 
yield a single example incorporating the features of all three prior illustrations. 
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